<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>AI Evaluation on Notes from the Rabbit Hole</title><link>https://magnus919.com/tags/ai-evaluation/</link><description>Recent content in AI Evaluation on Notes from the Rabbit Hole</description><generator>Hugo</generator><language>en</language><copyright>© [Magnus Hedemark](https://github.com/magnus919)</copyright><lastBuildDate>Thu, 11 Jun 2026 14:40:00 -0400</lastBuildDate><atom:link href="https://magnus919.com/tags/ai-evaluation/index.xml" rel="self" type="application/rss+xml"/><item><title>AI Evals 101: Stop the Slop</title><link>https://magnus919.com/2026/06/ai-evals-101-stop-the-slop/</link><pubDate>Thu, 11 Jun 2026 14:40:00 -0400</pubDate><guid>https://magnus919.com/2026/06/ai-evals-101-stop-the-slop/</guid><description>Companies are shipping AI into production with no way to tell if it&amp;rsquo;s actually working. The slop isn&amp;rsquo;t a model quality problem, it&amp;rsquo;s an evaluation problem. Here is the four-rung ladder that turns vibe checks into engineering discipline, with tools you can run today.</description></item></channel></rss>