░░░░░░░░░░░░░░░░░░░░ 0%

Posts for: #Llama-Cpp

Running a 35B MoE Model on a 16GB Consumer GPU

2026-05-27

Running a 35B MoE Model on a 16GB Consumer GPU

How to serve Qwen3.6-35B-A3B, a Mixture-of-Experts model with 3B active parameters, on an RTX 5070 Ti using llama.cpp. Full config, performance numbers, and the flags that make it fit.

[Read more]