~/bookmarks

William's Bookmark Library

/**/

Running Google Gemma 4 Locally With LM Studio’s New Headless CLI & Claude Code

ai.georgeliu.comSaved April 6, 202619 minApril 4, 2026

George Liu · via George Liu

Summary

LM Studio 0.4.0 introduces llmster and lms CLI for headless local inference of Google Gemma 4 26B-A4B on macOS.

The MoE model activates 3.8B parameters per token, achieving 51 tokens/second on a 48GB MacBook Pro M4 Pro with 82.6% MMLU Pro score. Setup involves installing lms CLI, starting the daemon, and loading the Q4_K_M quantized model for use with Claude Code.

Topics

Local AI Inference Mixture-of-Experts Models Google Gemma 4 LM Studio CLI macOS LLM Deployment

Visit Site All Bookmarks

Discover Similar Content