Moe-Model

Alibaba just handed the open-source AI community something remarkable: a model that scores 73.4% on SWE-bench Verified — one of the most demanding real-world software engineering benchmarks — while activating only 3 billion parameters per token during inference. Meet Qwen3.6-35B-A3B, released April 17 under the Apache 2.0 license. The Architecture: Sparse MoE Done Right Qwen3.6-35B-A3B is a Mixture of Experts (MoE) model with 35 billion total parameters, but that number is almost misleading for practical purposes. At inference time, the model activates only 3 billion parameters per token — roughly the compute footprint of a much smaller model, with the knowledge capacity of something far larger. ...