namgyu-youn/gemma-3-27b-it-AWQ-INT4-v2
VISIONConcurrency Cost:2Model Size:27BQuant:FP8Ctx Length:32kPublished:Mar 23, 2026Architecture:Transformer Cold

The namgyu-youn/gemma-3-27b-it-AWQ-INT4-v2 is a 27 billion parameter instruction-tuned Gemma-3 model, quantized to INT4 using AWQ. This model is optimized for efficient deployment and inference, offering a balance of performance and reduced memory footprint. It maintains strong performance on mathematical reasoning tasks, as demonstrated by its 0.9 exact match score on the GSM8K benchmark, matching the original unquantized model.

Loading preview...

Overview

This model, namgyu-youn/gemma-3-27b-it-AWQ-INT4-v2, is a 27 billion parameter instruction-tuned variant of the Gemma-3 architecture. It has been quantized to INT4 using the AWQ (Activation-aware Weight Quantization) method, making it highly efficient for deployment while aiming to preserve performance.

Key Capabilities

  • Efficient Inference: Quantized to INT4, significantly reducing memory footprint and accelerating inference speed compared to its full-precision counterpart.
  • Mathematical Reasoning: Achieves an exact match score of 0.9 on the GSM8K benchmark (with a limit of 10 examples), demonstrating strong capabilities in mathematical problem-solving, consistent with the original google/gemma-3-27b-it model.
  • Instruction Following: As an instruction-tuned model, it is designed to understand and execute a wide range of user prompts and instructions effectively.

Good for

  • Applications requiring a powerful language model with reduced memory and computational requirements.
  • Edge device deployment or environments with limited GPU resources.
  • Tasks involving mathematical reasoning and general instruction following where the original Gemma-3 27B IT model's performance is desired but with higher efficiency.