Lamapi/next-12b

Warm
Public
Vision
12B
FP8
32768
License: mit
Hugging Face
Overview

Next 12B: Türkiye's Advanced Vision-Language Model

Lamapi's Next 12B is a 12-billion parameter multimodal Vision-Language Model (VLM) built on Gemma 3, designed for exceptional performance in both text and image understanding. It stands out as Türkiye's most advanced open-source VLM, offering high performance, multimodal capabilities, and enterprise-readiness.

Key Capabilities

  • Advanced Vision-Language Understanding: Deeply understands images with sophisticated visual reasoning and generates detailed descriptions.
  • Multilingual Support: Provides professional-grade Turkish language support while maintaining extensive multilingual capabilities.
  • Superior Reasoning: Demonstrates strong logical and analytical reasoning for complex tasks, achieving 92.7% on MMLU and 95.3% on GSM8K benchmarks.
  • Optimized Architecture: Features a balanced architecture with a causal LLM and enhanced vision encoder, supporting various quantization options (Q8_0, Q4_K_M, F16, F32) for flexible deployment.
  • Production-Ready: Delivers reliable and consistent outputs suitable for enterprise applications.

Good For

  • Enterprise Content Generation: Creating high-quality multilingual content.
  • Advanced Visual Analysis: Detailed image understanding, captioning, and multimodal question answering.
  • Complex Reasoning Tasks: Solving mathematical problems (87.2% on MATH benchmark) and handling professional-level questions (84.4% on MMLU-Pro).
  • Educational Applications: Developing tutoring and explanation systems.
  • Customer Support: Automating multilingual customer service.
  • Data Extraction: Processing visual documents and extracting information.