thelamapi/next-4b

VISIONConcurrency Cost:1Model Size:4.3BQuant:BF16Ctx Length:32kPublished:Oct 15, 2025License:mitArchitecture:Transformer0.0K Open Weights Cold

Thelamapi/next-4b is a 4.3 billion parameter multimodal Vision-Language Model (VLM) based on Gemma 3, developed by Lamapi. As Türkiye's first open-source VLM, it is fine-tuned for efficient reasoning and context-aware multimodal outputs, handling both text and images. It supports multilingual capabilities, including Turkish, and is optimized for low-resource deployment using 8-bit quantization on consumer-grade GPUs. This model excels at visual understanding, reasoning, and creative generation for researchers and developers.

Loading preview...

Next 4B: Türkiye's First Vision-Language Model

Next 4B is a 4.3 billion parameter multimodal Vision-Language Model (VLM) developed by Lamapi, built upon the Gemma 3 architecture. It stands out as Türkiye's first open-source VLM, designed for efficient handling of both text and images. The model is specifically fine-tuned for robust reasoning and generating context-aware multimodal outputs.

Key Capabilities

  • Multimodal Understanding: Processes and generates content from both text and image inputs.
  • Efficient Deployment: Optimized for low VRAM environments, supporting 8-bit quantization for consumer-grade GPUs.
  • Multilingual Support: Offers strong capabilities in Turkish, alongside broader multilingual understanding.
  • Advanced Reasoning: Excels in logical and analytical reasoning tasks across modalities.
  • Consistent Outputs: Provides reliable and reproducible responses.

Performance Highlights

Next 4B demonstrates competitive performance, particularly in mathematical reasoning, achieving 82.7% on GSM8K and 70.5% on MATH. While its MMLU (5-shot) score is 84.6%, its smaller counterpart, Next 1B, shows even higher scores in MMLU and GSM8K, indicating strong performance within the tiny model category.

Good For

  • Image Captioning and Multimodal QA: Generating descriptions for images and answering questions based on visual and textual context.
  • Text Generation and Reasoning: Creating coherent text and performing analytical tasks.
  • Creative Storytelling: Developing narratives that integrate both visual and textual elements.
  • Low-Resource Applications: Ideal for deployment on consumer-grade GPUs due to its efficiency and 8-bit quantization support.
  • Research and Development: An open-source solution for exploring multimodal AI, especially with a focus on Turkish language contexts.