How to serve Qwen3.6-35B-A3B, a Mixture-of-Experts model with 3B active parameters, on an RTX 5070 Ti using llama.cpp. Full config, performance numbers, and the flags that make it fit.
Running a 35B MoE Model on a 16GB Consumer GPU


How to serve Qwen3.6-35B-A3B, a Mixture-of-Experts model with 3B active parameters, on an RTX 5070 Ti using llama.cpp. Full config, performance numbers, and the flags that make it fit.

Three research communities, three funding streams, three separate literatures. A sensor that could exist has been proposed for nine years and nobody has built it, because the people with the answers don’t read each other’s work.
A jumping spider’s depth perception system has been modeled, validated, and explicitly proposed as the blueprint for a sensor cheaper than LIDAR and simpler than stereo vision. Every physical primitive exists in a fab. So why hasn’t anyone built it?

A jumping spider with fewer than 100,000 neurons plans hour-long hunting routes, learns from experience, rotates mental images, and deceives its prey — challenging everything we thought intelligence required.

My Big Mouth Billy Bass is in pieces on the workbench. The teardown is done, the components are identified, and I haven’t decided what to build yet. That’s the point.

There is a difference between falling and floating. Between surrender and release. The counterintuitive skill for the age of AI might just be learning to be comfortable in suspension.

I used to be a tmux power user. Years later, running multiple Hermes Agent sessions at once, I needed it back. Here’s how I went from zero to power user in one sitting.

The engineers most threatened by AI aren’t the ones who can’t learn it. They’re the ones who can’t tolerate being bad at it. The real bottleneck is courage, and courage is a memory, not a trait.

The ikigai framework assumes a stable, unified self. For a neurodivergent person whose identity is genuinely multi-instance, any single-point answer is a reduction.

The 40-hour week was designed for factory work in 1926. Most of us don’t do factory work anymore. Three speculative role-plays show why one schedule cannot serve all jobs.

A five-sentence Discord comment revealed a design philosophy I’m calling ‘architecture by leverage’ — choose infrastructure that already solves N problems, then only build what the gaps require. Here’s what it looks like in practice.

The complete guide to Hermes Agent’s multi-agent kanban system: durable task boards, the dispatcher loop, worker lifecycle, orchestrator patterns, and production safety engineering for coordinating AI agents across sessions

A practical guide to installing and configuring hermes-cashew, the thought-graph memory provider for Hermes Agent. Five minutes from zero to a working semantic brain with recursive graph retrieval, think cycles, and sleep consolidation.