OpenAI released gpt-realtime and the Realtime API for speech-to-speech interactions. This guide provides prompting techniques to optimize voice agent behavior, focusing on latency, instruction following, and natural expression.
Highlights
Introduction of gpt-realtime for lower latency and expressive voice output.
Prompting strategies using bullets, precision, and emphasis to improve adherence.
Techniques for managing tool calls and handling unclear audio inputs.
Methods to reduce robotic repetition and control target language output.