GLM-5: From Vibe Coding to Agentic Engineering
GLM-5 is a 744B-parameter MoE model (40B active) from Zhipu AI, scaled up from GLM-4.5's 355B with 28.5T pre-training tokens and DeepSeek Sparse Atten...
GLM-4.6V is a multimodal vision-language model from Z.ai available in two versions: a 106B foundation model for cloud deployment and a 9B lightweight variant for local use.
The model processes up to 128K tokens and achieves state-of-the-art performance in visual understanding, featuring native function calling that integrates visual perception with executable actions for multimodal agent tasks. Key capabilities include direct processing of images and documents without text conversion, generation of interleaved image-text content, and pixel-accurate HTML/CSS reconstruction from UI screenshots.