The Hugging Face repo provides the Kimi‑K2‑Instruct model in GGUF format for local inference, detailing hardware needs, quantization guidance, and performance highlights.
Highlights
Mixture‑of‑Experts LLM with 1 trillion total parameters and 32 B active parameters
Quantized GGUF files run efficiently on consumer GPUs using a llama.cpp fork
Recommended minimum 128 GB unified RAM for small quant runs, 16 GB VRAM for acceptable speed
Guidance includes temperature setting, token limits, and expert selection per token
Offers both base and instruction‑tuned variants for research and chat use
auto-generated
Context
Audience
Developers and researchers interested in running large MoE language models locally
DomainArtificial Intelligence
Formatmodel checkpoint in GGUF format with documentation
moonshotai/Kimi-Linear-48B-A3B-Instruct · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
openrouter.ai
Trinity Mini (free) - API, Providers, Stats
Trinity Mini is a 26B-parameter (3B active) sparse mixture-of-experts language model featuring 128 experts with 8 active per token. Engineered for eff...
huggingface.co
Qwen/Qwen3-235B-A22B-Instruct-2507 · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.