namgyu-youn/gemma-3-27b-it-AWQ-INT4

VISIONConcurrency Cost:2Model Size:27BQuant:FP8Ctx Length:32kPublished:Feb 23, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The namgyu-youn/gemma-3-27b-it-AWQ-INT4 is a 27 billion parameter instruction-tuned Gemma model, developed by namgyu-youn, that has been quantized using AWQ INT4 for efficient deployment. This model is optimized for reduced memory footprint and faster inference, making it suitable for environments with limited computational resources. It maintains the core capabilities of the Gemma-3-27b-it base model while offering significant efficiency gains through 4-bit weight-only quantization.

Loading preview...

Model Overview

The namgyu-youn/gemma-3-27b-it-AWQ-INT4 is a 27 billion parameter instruction-tuned Gemma model, derived from google/gemma-3-27b-it. This version has been quantized using the AWQ (Activation-aware Weight Quantization) INT4 method, specifically leveraging torchao v0.16.0 for its quantization process. The primary goal of this quantization is to significantly reduce the model's memory footprint and improve inference speed, making it more accessible for deployment on hardware with constrained resources, such as H100+ GPUs as indicated in the original configuration.

Key Characteristics

  • Quantization: Utilizes AWQ INT4 (4-bit weight-only quantization) with torchao for efficiency.
  • Base Model: Built upon the google/gemma-3-27b-it instruction-tuned architecture.
  • Parameter Count: 27 billion parameters.
  • Context Length: Supports a context length of 32768 tokens.
  • Deployment Focus: Designed for efficient inference, particularly on compatible hardware like H100+ GPUs.

Usage and Limitations

The model is intended for tasks requiring the capabilities of the Gemma-3-27b-it model but with a focus on reduced resource consumption. The README provides a reproduction script for generating this quantized checkpoint and includes benchmark attempts for accuracy (using lm-eval on gsm8k) and throughput (using vLLM). However, it notes that both lm-eval (v0.4.11) and vLLM (v0.15.1) failed to reproduce expected results during benchmarking, indicating potential compatibility issues or specific environment requirements for evaluation. Users should be aware of these reported benchmarking challenges and may need to adapt their evaluation setups accordingly.