Shell + Skills + Compaction: Tips for long-running agents that do real work
Practical patterns for building with skills, hosted shell, and server-side compaction in the Responses API.
OpenAI launched gpt-realtime, its most advanced speech-to-speech model, along with the general availability of the Realtime API.
The model processes audio directly in a unified pipeline, improving latency, instruction following, and delivering more natural, expressive speech for production-grade voice agents. The guide details prompt structuring techniques for optimizing conversational agent behavior, covering aspects such as instruction precision, role definition, tone, pacing, and language control.