Thariq details lessons from building Claude Code, where **prompt caching** enables long-running agents by reusing prior computations via prefix matching to cut latency and costs.
Optimal ordering places static system prompts and tools first, followed by session context and messages, while avoiding changes like tool swaps or model switches that cause cache misses. Techniques include using messages for updates, deferring tool loading with stubs, and cache-safe compaction by matching parent prefixes exactly.