raxcore-dev/Rax-4.5

VISIONConcurrency Cost:1Model Size:2.3BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Nov 27, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Raxcore-dev's Rax 4.5 is a 2 billion parameter multimodal vision-language model designed for efficient production use. It features a 262,144 token context window and a hybrid attention architecture, enabling true multimodal understanding of images and text. Optimized for speed and memory efficiency, Rax 4.5 excels at visual question answering, document analysis, and complex multimodal reasoning tasks.

Loading preview...

Rax 4.5: Efficient 2B Vision Language Model

Rax 4.5 is a 2 billion parameter multimodal vision-language model developed by raxcore-dev, engineered for high efficiency and production readiness. It uniquely combines vision and text processing with an impressive 262,144 token context window, allowing for complex tasks involving extensive documents and visual elements. The model's hybrid attention architecture (alternating linear and full attention) and optimized KV cache contribute to its speed and memory efficiency, making it suitable for real-world deployments.

Key Capabilities

  • True Multimodal Understanding: Processes both images and text inputs seamlessly.
  • Long Context Processing: Handles very long sequences, beneficial for document analysis and visual QA.
  • Memory Efficient: Designed with a hybrid attention mechanism and optimized KV cache to reduce VRAM usage.
  • Production Ready: Compatible with vLLM, SGLang, and Hugging Face Transformers for easy integration.

Good For

  • Document Analysis: Extracting data from invoices, receipts, and forms.
  • Visual Question Answering: Building systems that answer questions based on images and text.
  • Content Moderation: Analyzing images with contextual understanding.
  • Accessibility: Generating detailed image descriptions for visually impaired users.
  • E-commerce: Product analysis and description generation.