How to serve Qwen3.6-35B-A3B, a Mixture-of-Experts model with 3B active parameters, on an RTX 5070 Ti using llama.cpp. Full config, performance numbers, and the flags that make it fit.
Running a 35B MoE Model on a 16GB Consumer GPU


How to serve Qwen3.6-35B-A3B, a Mixture-of-Experts model with 3B active parameters, on an RTX 5070 Ti using llama.cpp. Full config, performance numbers, and the flags that make it fit.