prefeitura-rio/Rio-2.5-Open-VL
Rio-2.5-Open-VL is a 4 billion parameter vision-language model developed by prefeitura-rio, based on the Qwen3-VL-4B-Instruct architecture. This model supports a context length of 32768 tokens and is designed for image-text-to-text tasks. It is optimized for processing both Portuguese and English inputs, making it suitable for multilingual visual understanding applications.
Loading preview...
Rio-2.5-Open-VL: A Multilingual Vision-Language Model
Rio-2.5-Open-VL is a 4 billion parameter vision-language model developed by prefeitura-rio. It is built upon the Qwen3-VL-4B-Instruct architecture, inheriting its robust capabilities for multimodal understanding. This model is specifically designed to handle image-text-to-text tasks, allowing it to process visual inputs alongside textual prompts and generate relevant text outputs.
Key Capabilities
- Multimodal Understanding: Integrates both image and text inputs to generate coherent textual responses.
- Multilingual Support: Optimized for processing content in both Portuguese (pt) and English (en).
- Large Context Window: Features a substantial context length of 32768 tokens, enabling the processing of longer and more complex inputs.
- Open-Source Foundation: Based on the Qwen3-VL-4B-Instruct model, providing a strong and adaptable base for various applications.
Good For
- Image Captioning: Generating descriptions for images in Portuguese or English.
- Visual Question Answering (VQA): Answering questions about images based on their content.
- Multilingual Content Generation: Creating text that combines visual information with prompts in either supported language.
- Research and Development: Serving as a foundation for further fine-tuning and experimentation in multimodal AI, particularly for applications requiring Portuguese language support.