Overview

Next2-Air is a 2.3 billion parameter Vision-Language Model (VLM) developed by Lamapi, based on the Qwen 3.5-2B architecture. It is designed for lightweight, fast, and capable performance on local machines and edge devices, emphasizing reasoning and multimodal understanding. The model is instruction-tuned using specialized datasets to enhance logical deduction and image processing, offering native support for both Turkish and English.

Key Capabilities

Optimized for Edge: Runs efficiently on MacBooks, mid-range PCs, and edge hardware without requiring powerful GPUs.
Multimodal Understanding: Processes images, performs OCR, and understands visual context.
Advanced Reasoning: Utilizes Chain-of-Thought (<think>) for logical deduction.
Extensive Context: Supports a native context length of 262,144 tokens, ideal for long document summarization.
Bilingual Proficiency: Fine-tuned for natural, fluent, and accurate responses in both Turkish and English.

Benchmark Performance

Next2-Air demonstrates competitive performance in the ultra-lightweight category, often surpassing its base model and competing with larger 3B-4B models. It shows improvements in text, reasoning, and instruction following benchmarks like MMLU-Pro (68.2%), MMLU-Redux (82.1%), and IFEval (82.5%). For multimodal tasks, it achieves strong results in MMMU (66.5%), MathVision (78.1%), and OCRBench (86.0%).

Ideal Use Cases

Mobile & Edge AI: Deploying smart assistants on smartphones or Raspberry Pi.
Real-Time OCR & Parsing: Quickly extracting data from receipts, invoices, or UI screenshots.
Fast Conversational Bots: Providing low-latency responses in Turkish and English.
Gaming & NPC Logic: Serving as a fast reasoning engine for dynamic in-game characters.

Overview

Overview

Key Capabilities

Benchmark Performance

Ideal Use Cases

Full Model Card (README)