jinaai/jina-embeddings-v4 · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
GLM-4.6V is a multimodal vision-language model from Z.ai available in two versions: a 106B foundation model for cloud deployment and a 9B lightweight variant for local use.
The model processes up to 128K tokens and achieves state-of-the-art performance in visual understanding, featuring native function calling that integrates visual perception with executable actions for multimodal agent tasks. Key capabilities include direct processing of images and documents without text conversion, generation of interleaved image-text content, and pixel-accurate HTML/CSS reconstruction from UI screenshots.