Name: RohitUltimate/Qwen3.5_VL_2B_12k API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: RohitUltimate

Model Overview

RohitUltimate/Qwen3.5_VL_2B_12k is a specialized vision-language model, building upon the Qwen3.5-2B architecture. It has been meticulously fine-tuned to excel in image-text-to-text tasks, offering enhanced instruction-following and multimodal understanding capabilities.

Key Capabilities

Vision-Language Integration: Processes both image and text inputs to generate text outputs.
Extended Context Window: Supports a substantial context length of 32768 tokens, allowing for processing longer inputs and maintaining conversational coherence.
Optimized for Specific Tasks: Demonstrates improved performance in instruction-following and multimodal understanding, particularly aligned for bank statement extraction.
Efficient Deployment: Designed to operate effectively on GPUs with less than 8GB VRAM, making it suitable for cost-effective and resource-constrained environments.

Deployment

The model can be efficiently served using the vLLM inference pipeline, which is known for its high throughput and memory efficiency. This allows for robust deployment even with its extended context capabilities.

Use Cases

This model is particularly well-suited for applications requiring:

Automated extraction and analysis of information from bank statements.
Multimodal instruction-following where both visual and textual cues are critical.
Applications needing a powerful yet VRAM-efficient vision-language model with a long context window.

Overview

Model Overview

Key Capabilities

Deployment

Use Cases

Full Model Card (README)