This is a real placeholder post.

Use this page for early thoughts about LLM and agent evaluation.

Possible shape

  • What I mean by evaluation
  • Why agent behavior feels harder to judge than normal software behavior
  • What makes an eval useful instead of performative
  • Questions I still have

Notes to myself

Replace this scaffold with real paragraphs when ready.