Fork of llama.cpp for C/C++ LLM inference with TurboQuant low-bit KV cache and weight compression.
Supports Gemma 4 MTP and Qwen 3.6 NextN speculative decoding, with claimed throughput gains of about 30–50%. Includes install, Docker, binary release, and source build instructions.