I write about evaluation systems, agentic workflows, and the engineering tradeoffs of building AI products. Most posts are short notes from work: what I tried, what broke, and what I learned.

Recent writing

  • How to Help a Coding Agent

    Coding agents do better when you give them full context, the right tools, and direct feedback from the systems they touch.
  • From 0.5 to 2

    Where eval has the highest leverage in the product lifecycle, and why developer taste matters more than fast automation.
  • How to DO Evals

    Why eval matters, the three conditions that make it trustworthy, and the four parts of an end-to-end eval system.

subscribe via RSS