Terminal-Bench ranks AI agents in terminal tasks. Warp leads with 52% accuracy using Claude models. The list compares various frameworks and model providers.
Highlights
Warp using claude-4-sonnet leads with 52.0% accuracy.
Anthropic models dominate the top rankings.
OpenAI models generally rank lower in this benchmark.
Diverse agents like Goose and Terminus are evaluated.
GLM-5 is a 744B-parameter MoE model (40B active) from Zhipu AI, scaled up from GLM-4.5's 355B with 28.5T pre-training tokens and DeepSeek Sparse Atten...
github.com
spring-ai-agent-utils/spring-ai-agent-utils at main · spring-ai-community/spring-ai-agent-utils
A Spring AI library that brings Claude Code-inspired tools and agent skills to your AI applications. - spring-ai-community/spring-ai-agent-utils
getconduit.sh
Conduit - Multi-Agent TUI for AI Coding Assistants
A multi-agent TUI for orchestrating AI coding assistants. Run Claude Code and Codex CLI side-by-side in your terminal.