Mellum2-12B-A2.5B-Thinking is a JetBrains reasoning-augmented code assistant utilizing a Mixture-of-Experts architecture with 12B total parameters and a 131K-token context. It explicitly emits chain-of-thought reasoning traces within tags, making it ideal for debugging and multi-step planning.
Highlights
Supports a 131,072-token context window using a hybrid of sliding-window and full-attention layers.
Requires vLLM nightly builds (post-v0.22.0) for MellumForCausalLM support and specific reasoning parsers like qwen3.
JetBrains recommends the Instruct variant for low-latency direct answers, reserving the Thinking variant for complex reasoning tasks.