zlab-princeton/Vero-Qwen3I-8B

VISIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Mar 28, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Vero-Qwen3I-8B is an 8 billion parameter open visual language model developed by zlab-princeton, part of the Vero family of models. It is fine-tuned from Qwen3-VL-8B-Instruct using 600K curated reinforcement learning samples across 59 datasets. This model specializes in broad multimodal reasoning, excelling across categories like charts, STEM, spatial reasoning, and instruction following, and achieves state-of-the-art performance on the VeroEval benchmark suite.

Loading preview...

Vero-Qwen3I-8B: An Open Visual Reasoning Model

Vero-Qwen3I-8B is an 8 billion parameter visual language model developed by zlab-princeton, built upon the Qwen3-VL-8B-Instruct base model. It is part of the Vero open RL model family, which provides models, data, evaluation, and training code for comprehensive multimodal reasoning.

Key Capabilities & Features

  • Broad Visual Reasoning: Trained on 600K curated RL samples from 59 datasets, covering 6 visual reasoning categories.
  • Diverse Applications: Excels in tasks involving charts and OCR, STEM, spatial and action understanding, knowledge and recognition, grounding and counting, and captioning and instruction following.
  • State-of-the-Art Performance: Achieves SOTA results among 8B models on VeroEval, a 30-benchmark suite designed for general visual reasoning.
  • Open Release: Includes fully open models, training code, evaluation, and the Vero-600K dataset.
  • Reasoning Trace Generation: Generates a reasoning trace within <think> tags before providing a final answer in <answer> tags, aiding in interpretability.

When to Use This Model

This model is ideal for applications requiring robust visual reasoning across a wide array of domains. Its strengths lie in interpreting complex visual information, performing STEM-related tasks, and following instructions based on visual input. It's particularly well-suited for research and development in multimodal AI due to its open-source nature and comprehensive evaluation suite.