The Hugging Face repo provides the Kimi‑K2‑Instruct model in GGUF format for local inference, detailing hardware needs, quantization guidance, and performance highlights.
Highlights
Mixture‑of‑Experts LLM with 1 trillion total parameters and 32 B active parameters
Quantized GGUF files run efficiently on consumer GPUs using a llama.cpp fork
Recommended minimum 128 GB unified RAM for small quant runs, 16 GB VRAM for acceptable speed
Guidance includes temperature setting, token limits, and expert selection per token
Offers both base and instruction‑tuned variants for research and chat use
auto-generated
Context
Audience
Developers and researchers interested in running large MoE language models locally
DomainArtificial Intelligence
Formatmodel checkpoint in GGUF format with documentation