NVIDIA Nemotron-3-Super is a 120B parameter open-weight hybrid MoE model with 12B active parameters designed for multi-agent AI and long-context reasoning tasks with up to 1M token context window.
The model runs locally on devices with 64GB RAM/VRAM and can be fine-tuned using Unsloth, with quantized versions (4-bit and 8-bit) available for different hardware configurations. It achieves higher throughput than comparable models like GPT-OSS-120B and Qwen3.5-122B while supporting inference parameters up to 262,144 tokens.