~/bookmarks

William's Bookmark Library

/**/

GitHub - antirez/ds4: DeepSeek 4 Flash local inference engine for Metal

github.comSaved May 8, 202615 min

antirez · via GitHub

Summary

ds4 is a native Metal inference engine specifically designed for DeepSeek V4 Flash, offering optimized performance through efficient parameter usage, dramatically shorter thinking sections proportional to problem complexity, and support for 1 million token context windows.

The engine features highly compressed KV caches enabling long-context inference on local machines like MacBooks with 128GB RAM, works efficiently with 2-bit quantization, and includes an HTTP API for integration. The project prioritizes end-to-end functionality with official logits validation and agent integration testing rather than generic GGUF support, with performance benchmarks showing 468 tokens/second prefill on Mac Studio M3 Ultra.

Topics

LLM Optimization DeepSeek V4 Flash Local Inference Engine Metal GPU GGUF Quantization

View on GitHub All Bookmarks

Discover Similar Content