If you use AI agents seriously, you’ve probably noticed something without giving it a name. Fresh chats start from zero. Switching from Claude to Cursor loses your context. You explain your project for the third time this week and shrug it off as the price of doing business. Researchers call this the memory problem, and the field is in the middle of an argument about how to fix it.
I’ve spent the past year, sometimes more, building memory systems for AI agents in the background of my day job. Tri-modal architectures running across three databases at once, taxonomies of specialized worker agents I called cognitive minions, decay algorithms for letting old beliefs fade, session-memory graphs that kept track of what we were doing across days. Most of that work never left my own laptop. Then a few weeks ago I came across a small open-source project with eight stars on GitHub and a sharper philosophy than most well-funded products in the space. I wrote an integration layer for it. The result is called hermes-cashew, and it helped me to appreciate something about agent memory that validated what I was seeing inside of my own bespoke prototypes.
This post is for people who use AI tools every day and want to think more clearly about why some of them feel like collaborators and most of them feel like a recurring blank slate.
Three bets the field is making, and the one you’ve already picked#
Underneath all the marketing, every coding agent and chat assistant is making a bet about what kind of memory it should have. There are roughly three options on the table, and you’ve already picked one without anyone telling you it was a choice.
The first is the fresh-chat bet. Every session is new. Memory is bolted on as a feature, not built in as a foundation. Claude, ChatGPT, and most chat-style assistants live here. Memory exists; it’s a folder somewhere you can browse, a few facts the assistant tries to surface; but the architecture assumes each conversation begins again. The wager is that better models in any given turn will outpace any benefit you’d get from carrying state forward.
The second is the one-personality bet. The agent has an identity. A voice, preferences, a relationship with you that grows over time. OpenClaw is the most prominent example. Personality, memory, and capability are tangled into a single artifact, and the bet is that what you actually want is a coherent character that knows you, not a tool that just performs.
The third is the stack-of-skills bet. Capability lives in artifacts you can carry between tools. AGENTS.md files. Agent skills. Each one is small, named, portable, and survives the conversation. Skills compound; conversations evaporate. The bet here is that what you really want is a personal toolkit, the way an engineer builds one up over years.
These bets aren’t exclusive. A serious user might end up running all three at once. But they pull in different directions, and most of the friction you feel when working with AI tools comes from the fact that the tools themselves haven’t decided which one is load-bearing.
What I’d been building, and why none of it was shippable#
Before I get to Cashew and the packaging discipline that made me write this post, I owe you a little context. I’ve been working on agent memory long enough to have strong opinions and the bruises to go with them.
The longest-running line of work was a tri-modal memory architecture: three databases, each playing to its strengths. A graph database for relationships, a time-series database for temporal patterns, a vector database for semantic similarity. The thesis: any one of those modalities on its own gives you a thin slice of cognition; all three together create capabilities that single-modal systems can’t replicate. Picture an assistant that can answer “what was I working on the last three Tuesdays, and how does that connect to the question I’m asking right now,” with the temporal pattern coming from one store, the semantic recall coming from another, and the relationships between people and projects coming from a third.
Three databases means you need something to stitch them together, so I wrote what I called cognitive minions: specialized worker agents, each tuned to a particular kind of cross-modal query, each responsible for combining results across stores. Some did temporal correlation. Others did meta-cognitive self-optimization, watching how the system performed and tuning its own retrieval. A few did semantic graph walking. The architecture had momentum, and a lot of careful thought behind it.
In a later simplified iteration, I built a session-memory system on top of LadybugDB. It’s a typed graph that tracks entities, relationships, commitments, and hypotheses across days and across projects. It’s the system that today knows what we’re working on across sessions, what’s open, what’s closed, and where the threads were last left. The special thing about this iteration was that it could procrastinate, and follow through on deferred commitments in a later session or even in a scheduled cron job.
These systems work. I run pieces of them every day. But here’s the catch, and it’s the catch I want you to see clearly: none of it was shippable. Careful wiring. Bespoke worker agents. Configuration that lived in my head. If you wanted what I had, you needed me to come over and set it up, and even then you’d inherit a stack you couldn’t maintain on your own.
I knew this was a problem. I hadn’t put in the work to solve it. The systems kept getting more sophisticated and less portable.
Finding Cashew#
A few weeks ago I came across a project on GitHub called Cashew, built by Raj Kripal Danday. It has eight stars at time of writing. Its README opens with a story about asking his aunt as a kid in India whether cats eat cashews, then asks the question that’s been rattling around his head ever since: what if forgetting is the intelligence?
Cashew is smaller than what I’d been building. One Python package. One SQLite file. Local embeddings. Vector retrieval feeding a recursive graph walk. Organic decay built into the substrate, so low-value knowledge fades while important patterns strengthen through use. There’s a think-cycle protocol that runs in the background, finding cross-domain connections nobody asked for. There’s no service to run. No three-database orchestration. Five minutes from pip install cashew-brain to a working brain.
The first time I read the README I felt validated. Most of the architectural ideas were ones I’d been working with for a year: graph plus vector retrieval, organic decay rather than explicit deletion, autonomous synthesis cycles, single-database simplicity over multi-store complexity. I had stronger versions of several of them. But I’d built a workshop. Raj had built something other people could pick up off the shelf.
That difference is the entire point.
The second time I read the README I started writing an integration. Cashew already had skills for Claude Code and OpenClaw. I run Hermes Agent for a particular kind of long-running task, and I wanted Cashew to be Hermes’s memory provider. So I wrote one. It’s called hermes-cashew, and v0.2.0 is what’s on GitHub today.
hermes-cashew, honestly#
The name overpromises what I actually built, so let me be precise. Cashew is the brain. hermes-cashew is the small piece of glue that lets Hermes Agent use it. I didn’t invent the architecture. I wrote the integration layer.
One reason this integration was even possible: Hermes Agent ships with a memory provider interface. That’s a feature you don’t see in every coding-agent harness. A memory provider in Hermes is a plug-in point. You can swap out the default memory subsystem for any backend that implements the contract. Hermes ships with several providers and supports several more; my job was to add a new one that points at Cashew. The interface is what made the work tractable. In a harness without that contract, integrating an external thought-graph would mean patching internals or maintaining a fork. With it, hermes-cashew is genuinely a plugin. That design choice is part of why I’m building research-leaning memory work on top of Hermes specifically. The harness lets a thoughtful memory subsystem be a drop-in rather than a rewrite.
What hermes-cashew currently ships, in v0.2.0:
- A Hermes plugin that registers Cashew as a memory provider
- Two LLM-accessible tools,
cashew_queryandcashew_extract, that let the agent search and store knowledge in the graph - Session-start prefetch that pulls relevant context into the system prompt
- Semantic search via sqlite-vec when available, with keyword fallback when it’s not
- Recursive BFS graph traversal so the agent can follow chains of related thoughts
- A configuration surface with thirty-one settings, all of which have sane defaults
What hermes-cashew doesn’t yet ship is more interesting than what it does. The v0.3 milestone (think cycles and sleep protocols, real decay scoring, deduplication, the warm daemon that makes queries fast enough to feel interactive) is open issues. v0.4 (a novelty gate that rejects near-duplicates at write time, source extractors for Obsidian and other knowledge sources, domain separation between user and agent memory) is open issues. v0.5 (privacy tagging, a dashboard for visualizing what the brain is thinking, traversal operations for auditing why the agent surfaced a particular memory) is open issues too.
Put another way: the parts of Cashew that are most distinctive (the engineered forgetting, the think cycles, the autonomous synthesis) are working in cashew-brain itself today; my Hermes integration exposes the retrieval primitives and not yet the brain operations. If you run Cashew through Claude Code or OpenClaw, you get the full thing. If you run it through Hermes via my plugin, you get retrieval and a roadmap.
I’m being this specific because I want you to see the shape of the work clearly, not because I’m proud of the gap. The retrieval layer matters; the brain operations matter more. I’m working through them in order.
What the research says, and the line that turned me#
While I was building, I went back into the research literature to make sure the ideas I was acting on weren’t just my own taste. The most useful update came from a paper called Diagnosing Retrieval vs. Utilization Bottlenecks in LLM Agent Memory, which works through a careful study of where long-running memory systems actually fail. The headline finding, translated into plain language: how you find old context matters far more than how you save it. On a benchmark called LoCoMo, accuracy varied by twenty points depending on how the system retrieved memories and only three to eight points depending on how it extracted and summarized them at write time. The expensive cognitive machinery on the saving side, the part where the system tries to be clever about what to keep, is mostly wasted effort compared to investing in better retrieval.
That finding cuts against the instinct most of us have when we hear “AI memory.” We assume the trick is getting the assistant to remember more, more carefully, with more structure. The research says the trick is making what’s already saved easier to find, and being honest about letting old beliefs fade.
A related strand of research has been working out the other half: forgetting as a design feature, not a failure. FadeMem demonstrated that biologically-inspired decay (memories fading on a curve, with access strengthening the ones that get used) preserved over eighty percent of important facts while using only fifty-five percent of the storage. APEX-MEM, Banerjee and colleagues’ property-graph approach to temporal memory, hit nearly eighty-nine percent on a long-context QA benchmark. The pattern across these papers is consistent: pruning is the work; hoarding is what happens when you skip the work.
Ben Nweke, in a recent essay, put the failure mode in a sentence I keep coming back to: the problem isn’t remembering; it’s failing to let go. He tells the story of switching from poetry to uv as a Python dependency manager, mentioning it once in passing to his assistant. The assistant kept suggesting poetry commands for a week, because the old preference had accumulated months of high-importance reinforcement and the new one was a single low-importance note. No safeguard caught the supersession. That isn’t a memory problem. It’s a forgetting problem.
This is exactly the territory Raj’s “what if forgetting is the intelligence” question is pointing at, and it lines up with where the academic work is heading. The interesting frontier in agent memory isn’t on the storage side. It’s on the curation side.
Why I still do most of my daily work in OpenCode#
I should say plainly: I built hermes-cashew, but I do most of my day to day agentic work in OpenCode, not in Hermes. The reason isn’t that one is better than the other; my workflow had grown and evolved with the Claude Code harness, and OpenCode was a more natural migration path from there. Hermes plus Cashew is the longer bet. The question I’m trying to answer with hermes-cashew is what an agent looks like at session one thousand, not at session one. Both can be true at the same time. They’re different jobs.
The counterpoint I keep in front of me#
Not everyone agrees with any of this. The smartest counterargument I’ve read recently came from Aristidis Vasilopoulos, a single developer who shipped a 108,000-line C# distributed system in under seventy days using Claude Code as his sole code generator. His memory architecture for that work was almost embarrassingly simple. A “hot-memory constitution” of about 660 lines of markdown. Nineteen specialized agent profiles, also markdown. Thirty-four cold-memory specification documents. No graph database, no decay algorithm, no temporal correlation. Just well-curated files and disciplined indexing.
His system works. It works at production scale. It works on real code that real engineers ship every day. The lesson I have to keep in front of me: the fundamentals haven’t changed. Write down the things that matter. Keep them current. Make them findable. The fancy memory architectures the field is building, including the one I just spent a year on, are mostly trying to automate a discipline that doesn’t require automation if the team is willing to do the work.
If you’re reading this and feeling like everything I just described sounds overengineered, Vasilopoulos is on your side. Most days, he’s on my side too.
How this got built#
I run engineering at the executive level for my day job. My hands-on coding muscle isn’t what it was a decade ago, and the kind of careful upstream watching that prevents schema drift between two related projects is genuinely something I have to work to maintain. hermes-cashew is a labor of love that gets evenings and weekends on my own computer, with OpenCode driving the Kimi-K2.6 model through most of the actual code authoring.
The velocity gain is real. I couldn’t have built the v0.2.0 release in the time I had without AI-accelerated development. There were also amusing failure modes that came directly from the way AI-assisted sessions work. At one point I’d reimplemented portions of cashew-brain’s own schema-management logic inside hermes-cashew without realizing it. The AI session didn’t have the upstream code loaded, so it cheerfully recreated functionality that already existed. The result was schema drift between the two projects, a real bug, and a small amount of egg on my face. It’s fixed now. It’s the kind of bug a careful human reviewer with the full upstream context would have caught immediately, and that an AI session won’t catch unless you’ve set it up to.
The honest read on AI-accelerated development, for someone like me doing serious indie work around an executive day job: it’s genuinely transformative for shipping pace, and it’s not yet a substitute for taste, context awareness, or upstream discipline. You still have to know what you’re doing. The tools just let you do more of it per hour.
Five things you can do this week, even if you never run any of this#
If you’ve read this far you probably want some takeaways. Five of them.
One: write your reusable workflows down as artifacts, not chats. When you find yourself explaining the same context to your assistant for the third time, that’s a signal. Save it. A Cursor rule, an AGENTS.md, a custom Claude project, a Hermes skill, a folder of markdown templates you paste in. Anywhere portable. Skills compound across sessions and tools. Conversations evaporate at the end of the tab.
Two: look for tools whose authors have written about what to forget, not just what to remember. This is a heuristic, not a rule, but it’s been weirdly reliable. If a tool brags about how much it remembers, ask what it forgets. Cashew is one example of a tool whose authors have thought carefully about what to let go; the academic systems coming out of recent research are others. They tend to behave noticeably better over time than tools that just announce “we have memory!” Read the philosophy file. It tells you a lot.
Three: write context to be retrieved, not to be read. When you save reusable context, structure it for retrieval. Short, tagged, specific, easy to find. Long narrative documents that explain everything are harder for the assistant to use than short structured ones. The research is consistent on this: how findable your context is matters more than how completely you wrote it.
Four: prune. Manually, for now. The tools aren’t doing engineered forgetting yet. Including mine; my plugin’s garbage collection is essentially random deletion at a one-percent rate. That isn’t forgetting; that’s hoping. Until the next generation of tools ships real decay and supersession, you’re the curator. Schedule the time. Delete the stale. Mark the superseded. The gain compounds.
Five: the bottleneck right now is packaging, not research. This one is the post in a sentence. The interesting agent-memory work is happening in small projects with sharp philosophies and limited audiences. If you’ve built a memory system for yourself that works (and a surprising number of you have), the field needs you to package it well enough that someone else can run it in five minutes. That’s the move I’m trying to make with hermes-cashew, and it’s the move Cashew itself made before me. Distribution is where the leverage is.
Closing#
The research is converging. The forgetting question is the one worth asking. The architectures we know how to build are mostly waiting on the discipline to package them in ways other people can run. If you’ve been quietly building cognitive infrastructure for your own use, or thinking about it, take this as an invitation. Pick a thing you have running. Make it installable. Write a README that explains the philosophy. Ship it.
The brain you build for yourself is interesting. The brain you ship to other people is what changes the field.
