Name: RohitUltimate/Qwen3.5-2B_20K API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: RohitUltimate

Model Overview

RohitUltimate/Qwen3.5-2B_20K is a 2.3 billion parameter vision-language model, building upon the Qwen3.5-2B architecture. It has been specifically fine-tuned for image-text-to-text tasks, demonstrating enhanced performance in instruction-following and multimodal understanding.

Key Capabilities

Vision-Language Integration: Processes both image and text inputs to generate text outputs.
Extended Context Window: Supports an impressive context length of 12,000 tokens, allowing for more comprehensive input processing.
Optimized for Bank Statement Extraction: Benefits from high-quality training data and alignment tailored for this specific application.
Efficient Deployment: Designed to run effectively on GPUs with less than 8GB VRAM, making it suitable for low-cost inference environments.
Improved Instruction Following: Shows better adherence to instructions compared to its base model.

When to Use This Model

This model is particularly well-suited for:

Applications requiring multimodal understanding where both visual and textual information are crucial.
Tasks involving the extraction of information from bank statements or similar document processing scenarios.
Deployments where GPU memory is limited (under 8GB VRAM) but robust vision-language capabilities are needed.
Use cases demanding an extended context window for processing longer or more complex inputs.

Overview

Model Overview

Key Capabilities

When to Use This Model

Full Model Card (README)