thelamapi/next2-air
Next2-Air is a 2.3 billion parameter Vision-Language Model (VLM) developed by Lamapi in Türkiye, built on the Qwen 3.5-2B architecture. It is optimized for lightweight, fast inference on local machines and edge devices, featuring multimodal understanding, logical reasoning with Chain-of-Thought, and native bilingual support for Turkish and English. The model supports a substantial 262,144 token context length, making it suitable for extensive document processing and real-time applications.
Loading preview...
Overview
Next2-Air is a 2.3 billion parameter Vision-Language Model (VLM) developed by Lamapi, based on the Qwen 3.5-2B architecture. It is designed for lightweight, fast, and capable performance on local machines and edge devices, emphasizing reasoning and multimodal understanding. The model is instruction-tuned using specialized datasets to enhance logical deduction and image processing, offering native support for both Turkish and English.
Key Capabilities
- Optimized for Edge: Runs efficiently on MacBooks, mid-range PCs, and edge hardware without requiring powerful GPUs.
- Multimodal Understanding: Processes images, performs OCR, and understands visual context.
- Advanced Reasoning: Utilizes Chain-of-Thought (
<think>) for logical deduction. - Extensive Context: Supports a native context length of 262,144 tokens, ideal for long document summarization.
- Bilingual Proficiency: Fine-tuned for natural, fluent, and accurate responses in both Turkish and English.
Benchmark Performance
Next2-Air demonstrates competitive performance in the ultra-lightweight category, often surpassing its base model and competing with larger 3B-4B models. It shows improvements in text, reasoning, and instruction following benchmarks like MMLU-Pro (68.2%), MMLU-Redux (82.1%), and IFEval (82.5%). For multimodal tasks, it achieves strong results in MMMU (66.5%), MathVision (78.1%), and OCRBench (86.0%).
Ideal Use Cases
- Mobile & Edge AI: Deploying smart assistants on smartphones or Raspberry Pi.
- Real-Time OCR & Parsing: Quickly extracting data from receipts, invoices, or UI screenshots.
- Fast Conversational Bots: Providing low-latency responses in Turkish and English.
- Gaming & NPC Logic: Serving as a fast reasoning engine for dynamic in-game characters.