This GitHub repo offers Docker configs for running LLMs on RTX 3090s using vLLM or llama.cpp. It focuses on Qwen3.6-27B, supporting single or dual GPU setups with OpenAI-compatible APIs and detailed benchmarks.
Highlights
Multi-engine support for vLLM, llama.cpp, and SGLang.
Optimized for RTX 3090 hardware in 1x, 2x, or cluster setups.
Production-ready configs for Qwen3.6-27B model.
Includes OpenAI-compatible API via Docker Compose.
Provides scaling docs and performance benchmarks.
auto-generated
noonghunna · via GitHub
Context
Audience
AI developers and homelab enthusiasts deploying local LLMs