prefeitura-rio/Rio-2.5-Open-VL

VISIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Feb 3, 2026License:mitArchitecture:Transformer0.0K Open Weights Cold

Rio-2.5-Open-VL is a 4 billion parameter vision-language model developed by prefeitura-rio, based on the Qwen3-VL-4B-Instruct architecture. This model supports a context length of 32768 tokens and is designed for image-text-to-text tasks. It is optimized for processing both Portuguese and English inputs, making it suitable for multilingual visual understanding applications.

Loading preview...

Rio-2.5-Open-VL: A Multilingual Vision-Language Model

Rio-2.5-Open-VL is a 4 billion parameter vision-language model developed by prefeitura-rio. It is built upon the Qwen3-VL-4B-Instruct architecture, inheriting its robust capabilities for multimodal understanding. This model is specifically designed to handle image-text-to-text tasks, allowing it to process visual inputs alongside textual prompts and generate relevant text outputs.

Key Capabilities

  • Multimodal Understanding: Integrates both image and text inputs to generate coherent textual responses.
  • Multilingual Support: Optimized for processing content in both Portuguese (pt) and English (en).
  • Large Context Window: Features a substantial context length of 32768 tokens, enabling the processing of longer and more complex inputs.
  • Open-Source Foundation: Based on the Qwen3-VL-4B-Instruct model, providing a strong and adaptable base for various applications.

Good For

  • Image Captioning: Generating descriptions for images in Portuguese or English.
  • Visual Question Answering (VQA): Answering questions about images based on their content.
  • Multilingual Content Generation: Creating text that combines visual information with prompts in either supported language.
  • Research and Development: Serving as a foundation for further fine-tuning and experimentation in multimodal AI, particularly for applications requiring Portuguese language support.