m8than/gemma-2-9b-it

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:9BQuant:FP8Ctx Length:16kPublished:May 9, 2025License:gemmaArchitecture:Transformer Warm

The m8than/gemma-2-9b-it model is a 9 billion parameter instruction-tuned variant of Google's Gemma 2 architecture, featuring a 16384-token context length. This model is a 4-bit quantized version, optimized for efficient fine-tuning with Unsloth, enabling faster training and reduced memory consumption. It is particularly well-suited for developers looking to quickly fine-tune a powerful Gemma 2 model on resource-constrained environments like Google Colab.

Loading preview...

Overview

This model, m8than/gemma-2-9b-it, is a 9 billion parameter instruction-tuned version of Google's Gemma 2, specifically optimized for efficient fine-tuning using the Unsloth library. It is provided as a directly quantized 4-bit model using bitsandbytes, making it highly memory-efficient.

Key Capabilities

  • Efficient Fine-tuning: Designed to be fine-tuned 2x faster with 63% less memory compared to standard methods, especially on hardware like a Tesla T4 GPU.
  • Resource-Friendly: Enables powerful LLM fine-tuning on free tiers of platforms like Google Colab.
  • Broad Compatibility: Supports various fine-tuning tasks including conversational models (ShareGPT ChatML / Vicuna templates), text completion, and DPO (Direct Preference Optimization).
  • Export Options: Fine-tuned models can be exported to GGUF, vLLM, or uploaded directly to Hugging Face.

Good For

  • Developers and researchers seeking to quickly and cost-effectively fine-tune a Gemma 2 model.
  • Projects requiring efficient training on limited GPU resources.
  • Experimenting with instruction-tuned models for various NLP tasks, from chat to text generation.