Lamapi/next-4b

Cold
Public
Vision
4.3B
BF16
32768
License: mit
Hugging Face
Overview

Overview of Lamapi/next-4b

Lamapi/next-4b is a 4.3 billion parameter multimodal Vision-Language Model (VLM) built upon the Gemma 3 architecture. It is notable as Türkiye’s first open-source VLM, specifically fine-tuned to process and generate both text and images efficiently. The model emphasizes reasoning and context-aware multimodal outputs, with robust support for the Turkish language alongside broader multilingual capabilities.

Key Capabilities

  • Multimodal Understanding: Processes and reasons over both image and text inputs.
  • Efficiency: Optimized for low VRAM environments, supporting 8-bit quantization for deployment on consumer-grade GPUs.
  • Multilingual Support: Handles complex Turkish text with high accuracy, in addition to other languages.
  • Advanced Reasoning: Capable of logical and analytical reasoning across both visual and textual data.
  • Consistent Outputs: Designed to provide reliable and reproducible responses.

Good For

  • Researchers and Developers: Ideal for those needing a high-performance, accessible multimodal AI.
  • Visual Understanding Tasks: Excels at image captioning, multimodal question answering, and visual reasoning.
  • Text Generation: Capable of creative storytelling and general text generation.
  • Low-Resource Deployment: Suitable for applications requiring efficient operation on modest hardware.