Open-source language models use incompatible wire formats for tool calling, forcing each inference engine (vLLM, SGLang, etc.) to independently write custom parsers for every model family—creating an M×N multiplication of duplicate effort.
Generic parsers fail because model formats are unconstrained training-time decisions with no shared convention, leaving edge cases like reasoning token leakage and special token stripping unhandled. The solution requires declarative specifications that both grammar engines and output parsers can consume, similar to how chat templates standardized prompt formatting in Hugging Face.