AI Evaluation on Notes from the Rabbit Hole

AI Evaluation on Notes from the Rabbit Holehttps://magnus919.com/tags/ai-evaluation/Recent content in AI Evaluation on Notes from the Rabbit HoleHugoen© [Magnus Hedemark](https://github.com/magnus919)Thu, 11 Jun 2026 14:40:00 -0400AI Evals 101: Stop the Slophttps://magnus919.com/2026/06/ai-evals-101-stop-the-slop/Thu, 11 Jun 2026 14:40:00 -0400https://magnus919.com/2026/06/ai-evals-101-stop-the-slop/Companies are shipping AI into production with no way to tell if it’s actually working. The slop isn’t a model quality problem, it’s an evaluation problem. Here is the four-rung ladder that turns vibe checks into engineering discipline, with tools you can run today.