RedHatAI released a preliminary EAGLE-3 speculator model for Gemma-4-31B-it to accelerate inference via vLLM. Trained on Magpie and UltraChat data, it improves token acceptance rates across coding, math, and QA benchmarks.
Highlights
Uses EAGLE-3 speculative decoding for Gemma-4-31B-it.
Released by RedHat as a preliminary v1.0 model.
Optimized for vLLM with num_speculative_tokens=3.
Shows improved acceptance lengths in HumanEval and math reasoning.