314e/abstrakthealth-rerun-VLM-Gemma3-Entity

VISIONConcurrency Cost:1Model Size:12BQuant:FP8Ctx Length:32kPublished:Jan 14, 2026License:gemmaArchitecture:Transformer Cold

The 314e/abstrakthealth-rerun-VLM-Gemma3-Entity model is a 12 billion parameter multimodal model from Google's Gemma 3 family, built from the same research as Gemini models. It handles both text and image inputs, generating text outputs, and features a large 128K token context window with multilingual support for over 140 languages. This model is well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning, and is designed for deployment in resource-limited environments.

Loading preview...

Model Overview

This model is part of the Gemma 3 family, developed by Google DeepMind, leveraging the same research and technology as the Gemini models. It is a multimodal model, capable of processing both text and image inputs to generate text outputs. The model features open weights for both pre-trained and instruction-tuned variants.

Key Capabilities

  • Multimodal Input: Accepts text strings and images (normalized to 896x896 resolution, encoded to 256 tokens each).
  • Large Context Window: Supports a total input context of 128K tokens.
  • Multilingual Support: Trained on data including content in over 140 languages.
  • Diverse Task Performance: Excels in text generation and image understanding tasks such as question answering, summarization, and reasoning.
  • Resource-Efficient Deployment: Its relatively small size (12B parameters) makes it suitable for deployment on devices with limited resources like laptops, desktops, or private cloud infrastructure.

Training Details

The 12B parameter variant was trained on 12 trillion tokens from a diverse dataset comprising web documents, code, mathematical texts, and a wide range of images. Rigorous data preprocessing included CSAM filtering, sensitive data filtering, and quality/safety filtering. Training was conducted on Tensor Processing Unit (TPU) hardware (TPUv4p, TPUv5p, and TPUv5e) using JAX and ML Pathways.

Benchmark Performance

The Gemma 3 PT 12B model demonstrates strong performance across various benchmarks:

  • Reasoning & Factuality: Achieves 84.2 on HellaSwag (10-shot) and 72.6 on BIG-Bench Hard (few-shot).
  • STEM & Code: Scores 74.5 on MMLU (5-shot) and 71.0 on GSM8K (8-shot).
  • Multilingual: Reaches 64.3 on MGSM and 69.4 on Global-MMLU-Lite.
  • Multimodal: Attains 111 on COCOcap and 82.3 on DocVQA (val).

Intended Usage

This model is designed for a broad range of applications, including:

  • Content Creation: Generating creative text formats, marketing copy, and email drafts.
  • Conversational AI: Powering chatbots and virtual assistants.
  • Text Summarization: Creating concise summaries of documents.
  • Image Data Extraction: Interpreting and summarizing visual data for text communications.
  • Research & Education: Serving as a foundation for VLM/NLP research and language learning tools.