Holo3.1 is an updated family of computer-use models focused on robust operation across web, desktop, and mobile environments and diverse agent frameworks.
It introduces quantized checkpoints (FP8, Q4 GGUF, NVFP4) and smaller model sizes (0.8B, 4B, 9B) to enable fast, cost-effective, and fully local inference on consumer and enterprise hardware with minimal performance loss. Benchmarks show substantial accuracy gains on mobile (e.g., AndroidWorld) and improved cross-harness performance, including major speedups when combined with NVIDIA-optimized runtimes.