ttCC3/gemma-3-12b-it
The ttCC3/gemma-3-12b-it is a 12 billion parameter instruction-tuned multimodal model from Google DeepMind, part of the Gemma 3 family, capable of handling text and image inputs to generate text outputs. It features a large 128K token context window and multilingual support for over 140 languages. This model is optimized for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning, and is designed for deployment in resource-limited environments.
Loading preview...
Gemma 3 12B Instruction-Tuned Model
This model is a 12 billion parameter instruction-tuned variant from Google DeepMind's Gemma 3 family, built using the same research and technology as the Gemini models. It is a multimodal model, accepting both text and image inputs to produce text outputs, and offers open weights. A key feature is its substantial 128K token context window, alongside extensive multilingual support for over 140 languages.
Key Capabilities
- Multimodal Understanding: Processes both text and images (normalized to 896x896 resolution, encoded to 256 tokens each) to generate relevant text.
- Extended Context: Supports a total input context of 128K tokens, enabling processing of longer and more complex inputs.
- Multilingual Support: Trained on web documents in over 140 languages, enhancing its ability to understand and generate text across diverse linguistic contexts.
- Versatile Text Generation: Excels at tasks such as question answering, summarization, and reasoning, making it suitable for various applications.
- Resource-Efficient Deployment: Its relatively compact size allows for deployment on devices with limited resources, including laptops, desktops, and private cloud infrastructure.
Training and Performance
The 12B model was trained on 12 trillion tokens, encompassing web documents, code, mathematics, and images. It demonstrates strong performance across various benchmarks, including reasoning (e.g., 72.6 on BIG-Bench Hard), STEM and code (e.g., 74.5 on MMLU, 71.0 on GSM8K), and multimodal tasks (e.g., 71.2 on VQAv2). The model was developed with rigorous CSAM and sensitive data filtering, and evaluated for child safety, content safety, and representational harms, showing significant improvements over previous Gemma models.