tg-rising/gemma-3-12b-it-heretic-v2-MLX-BF16

VISIONConcurrency Cost:1Model Size:12BQuant:FP8Ctx Length:32kPublished:May 25, 2026Architecture:Transformer Cold

The tg-rising/gemma-3-12b-it-heretic-v2-MLX-BF16 is a 12 billion parameter instruction-tuned language model, based on Google's Gemma-3 architecture, specifically converted for MLX in BF16 precision. This model is a text-only variant, optimized for general text generation tasks. It offers a 32768 token context length and is derived from the DreamFast/gemma-3-12b-it-heretic-v2 model, focusing on efficient text processing within the MLX framework.

Loading preview...

Model Overview

This model, tg-rising/gemma-3-12b-it-heretic-v2-MLX-BF16, is a 12 billion parameter instruction-tuned language model. It is an MLX-converted, text-only variant of the DreamFast/gemma-3-12b-it-heretic-v2 model, which itself is based on Google's gemma-3-12b-it architecture. This specific version is provided in BF16 (BFloat16) precision, offering a balance between performance and memory footprint for text generation tasks.

Key Characteristics

  • Architecture: Based on the Gemma-3 family, specifically the 12B instruction-tuned variant.
  • Parameters: 12 billion parameters, providing substantial generative capabilities.
  • MLX Conversion: Optimized for Apple Silicon (MLX framework) for efficient local inference.
  • Precision: BF16 (BFloat16) quantization, offering higher fidelity compared to lower-bit quantized versions.
  • Text-Only: Designed exclusively for text generation and understanding, without multimodal capabilities.
  • Context Length: Supports a context window of 32768 tokens.

Use Cases

This model is particularly well-suited for:

  • General-purpose text generation.
  • Instruction-following tasks where text-based responses are required.
  • Applications leveraging the MLX framework on compatible hardware.
  • Developers seeking a high-precision, unquantized text model for MLX environments.