Coding with Windsurf

Optimizing for Cost and Productivity

Hello, I'm Alper ๐Ÿ‘‹

  • Explorer of AI
  • Builder of Things
  • Father of Three
Alper

Self-employed Consultant & Developer

Engineering Lead @ Genius Sports

X Logo @alperortac

What is Windsurf?

Windsurf is a VS Code fork with built-in AI capabilities that assist with coding

๐Ÿ‘‡ Me while I'm editing this slide ๐Ÿ‘‡

Windsurf Meta

Why not ... ?

Cursor

Cursor

GitHub Copilot

GitHub Copilot

Claude Code

Claude Code

JetBrains AI

JetBrains AI

Ampcode

Ampcode

Antigravity

Antigravity

Kilocode

Kilocode

Kiro

Kiro

Opencode

Opencode

Codex

Codex

Personal preference. All of these are great.

Costs

Free

$0

/month


25 credits

Pro

$15

/month


500 credits

Additional roll-over credits

+250 for $10

Token Count Credits

Comparison: Opus 4.5

1.3M
$17
12.5M
$15

Pro (1x)

6.7M
$100
12.5M
$15

Max 100 (5x)

27M
$200
25M
$35

Max 200 (20x)

Claude Code Windsurf
Price /mo Price /mo
# Tokens # Tokens

Tokens per Dollar: Opus 4.5

76K
833K

Pro (1x)

11x cheaper

67K
833K

Max 100 (5x)

12x cheaper

135K
714K

Max 200 (20x)

5x cheaper

Claude Code Windsurf

Which Model?

It depends...

  • Changes over time
  • Complexity of task
  • Promo models

Me currently:

Opus 4.5 (4x), Sonnet 4.5 (2x) and SWE 1.5 (free)

Rules

AGENTS.md

						
							+ # Rules
							+ Use types, never `any`
							
					

In your project root directory

Browser Verification

MCP or CLI to interact with a real browser

Has access to browser console, network tab, lighthouse audits, etc.

Choose Wisely

  • Playwright CLI (best to avoid context rot)
  • Playwright MCP
  • Chrome Devtools MCP (most features)

Rules #2

AGENTS.md

						
							# Rules
							Use types, never `any`
							+ Use playwright-cli to verify UI changes
							
					

Backwards Compatibility

AI:

Let me add backwards compatibility for this legacy code.

Me:

dont, this is just adding tech debt

AI:

I copied it to src/api.new.ts and kept the old file for reference.

Me:

no! just overwrite the existing file!

Rules #3

AGENTS.md

						
							# Rules
							Use types, never `any`
							Use playwright-cli to verify UI changes
							+ Never add backwards compatibility without explicit approval
							
					

Writing Tests

AI:

Let me replace the assertion with a console.error() statement.

Me:

no no never make tests less reliable!

AI:

I added a 15% threshold for correctness to let the tests pass.

Me:

HOW ON EARTH IS THAT ACCEPTABLE?!

Rules #4

AGENTS.md

						
							# Rules
							Use types, never `any`
							Use playwright-cli to verify UI changes
							Never add backwards compatibility without explicit approval
							+ Never loosen test assertions just to make them pass
							+ Failing real tests is better than fake passing tests
							+ You can declare tests as ready even if they are failing
							
					

Context Window

Most models have a 200k token context window (a few have 1M)

โš ๏ธ 50%
0% Great Okay Degraded 100%

Once >50% full, performance degrades significantly

Distractions that fill context fast:

Back & forth chats Reading tons of files Lots of tool calls Inline plans

Context Curation

๐Ÿงน What can we do to keep the context clean? ๐Ÿงน

โŒ Cluttered Context

๐Ÿ’ฌ Build me a login page
๐Ÿ“„ Read auth.ts, user.ts, config.ts...
๐Ÿ“ Here's my 200-line plan...
๐Ÿ’ฌ Actually, change the design
๐Ÿ”ง Running 15 tool calls...
๐Ÿ’ฌ No wait, go back
๐Ÿง  Context 78% full

โœ… Curated Context

๐Ÿ’ฌ Build login page per spec.md
๐Ÿ“‹ Read plan stored in spec.md
โšก Fast Context reads files
๐Ÿ”ง Implementing...
๐Ÿง  Context 15% full
๐Ÿ“„ Plans in .md files ๐Ÿ”„ New chats frequently โช Revert bad results

Fast Context

Fast Context

Why it's great

  • It's a subagent โ†’ context stays clean
  • Extremely fast โšก

Rules #5

AGENTS.md

						
							# Rules
							Use types, never `any`
							Use playwright-cli to verify UI changes
							Never add backwards compatibility without explicit approval
							Never loosen test assertions just to make them pass
							Failing real tests is better than fake passing tests
							You can declare tests as ready even if they are failing
							+ Always use Fast Context when searching for file contents
							
					

Interviews

Chat asks for clarifications:

  1. Less guessing
  2. Better requirements
Ask Questions

Interviews #2

Every Opus prompt costs 4 credits - no matter the token count

"hello"
~200k token horizon

๐Ÿ’ธ   4 credits

๐Ÿ“‹ Detailed requirements
๐Ÿ—‚๏ธ Codebase context
๐Ÿงช Test expectations
๐Ÿ“ Architecture decisions
๐Ÿ”— Related files
โš™๏ธ Edge cases
~200k token horizon

โœ…   4 credits

Interviews #3

The Ask tool fills the prompt token context with requirement gathering

~200k token horizon
Me Build a dashboard with charts
AI Which chart library? Recharts?
Me Yes, use Recharts with dark theme
AI What data sources? REST API?
Me GraphQL, here's the schema...
AI Should I add filtering?
Me Yes, date range + category
AI How should errors be handled?
Me Toast notifications, retry button
AI Got it, starting implementation!

Each ask round-trip adds context without spending credits

Rules #6

AGENTS.md

						
							# Rules
							Use types, never `any`
							Use playwright-cli to verify UI changes
							Never add backwards compatibility without explicit approval
							Never loosen test assertions just to make them pass
							Failing real tests is better than fake passing tests
							You can declare tests as ready even if they are failing
							Always use Fast Context when searching for file contents
							+ Ask before, when blocked mid-task and to confirm unknowns
							+ Never guess, never assume or improvise unagreed solutions
							+ Only I can declare a task complete
							
					

Planning

Communicate intent, not implementation details

โŒ Vague Prompt

"Build me a dashboard"

โœ… Spec-driven

spec.md

Goal: Usage dashboard with daily visitors

Viz: Line chart (Recharts), last 30 days

Data: REST API /api/analytics

Style: Dark theme, Tailwind

Tests: E2E for chart rendering

Simple markdown files and/or plan mode are sufficient

Skill Tree

Apart from Rules there are:

  • Memories
  • Skills
  • MCP's
  • Workflows (Commands)

I use them only sparingly because I prefer progressive improvements.

YMMV. You might want to try frameworks like:

Troubleshooting

Let AI show you its system prompt

Why didnt you follow the instructions about legacy code?

Which part of the rules make you write tests like that?

Curiosity > Judgement

Little Helpers

  • Autocomplete
  • Commit Message Generation ๐Ÿ‘‰
  • Lifeguard reviews code for issues
Commit Message Generation

Magic Wand: If I could have one new Windsurf feature

Subagents with isolated context are the missing piece

๐Ÿง‘ User Prompt
โ†’

๐Ÿช„ Orchestrator

โ“ Ask

cheap
model

๐Ÿ“‹ Plan

frontier
model

โš™๏ธ Execute

decent
model

๐Ÿ” Review

cheap
model
โ†’
โœ… Result

Each subagent has its own context to avoid pollution

Next steps

Swiss-cheese model and spec-driven development

Swiss Cheese Model

source: https://www.latent.space/p/reviews-dead

Alper's AI Stack

AI Stack

Building a community for AI enthusiasts

https://aistack.to

Thanks!

Questions?

https://x.com/alperortac