mshojaei77/Gemma-2-2b-fa

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2.6BQuant:BF16Ctx Length:8kPublished:Mar 4, 2025License:mitArchitecture:Transformer0.0K Open Weights Warm

mshojaei77/Gemma-2-2b-fa is an experimental 2.6 billion parameter model, fine-tuned from Google's Gemma-2-2b-it using QLoRA. It is specifically adapted for Persian language conversational tasks, leveraging the mshojaei77/Persian_sft dataset. This model is an early-stage proof-of-concept for research and experimentation in Persian AI, designed for text generation in conversational applications.

Loading preview...

Persian Gemma 2b: An Experimental Conversational AI

mshojaei77/Gemma-2-2b-fa is an early-stage experimental model derived from Google's Gemma-2-2b-it, fine-tuned using QLoRA for Persian language conversational tasks. With 2.6 billion parameters and an 8192-token context length, it inherits the efficient architecture of the base Gemma model.

Key Characteristics & Training:

  • Base Model: google/gemma-2-2b-it
  • Fine-tuning: QLoRA (Quantization-aware Low-Rank Adaptation) for parameter-efficient training.
  • Dataset: mshojaei77/Persian_sft, a collection of Persian conversations for instruction fine-tuning.
  • Language: Primarily Persian (fa).
  • Critical Note: The model was trained for only 20 steps, making it a proof-of-concept with significantly under-optimized performance and no formal evaluation.

Intended Use Cases:

  • Research & Experimentation: Investigating the feasibility of fine-tuning Gemma for Persian conversational AI.
  • Educational Purposes: Demonstrating QLoRA fine-tuning techniques and Persian language model development.
  • Prototyping (with caution): Exploring potential applications, acknowledging its preliminary state and limitations.

Limitations:

Due to severe under-training, the model exhibits:

  • Sub-optimal Performance: Limited fluency, coherence, and prone to hallucinations.
  • Bias: Likely inherits and amplifies biases from its base model and dataset.
  • Poor Generalization: Performance degrades significantly outside the training distribution.
  • No Formal Evaluation: Performance metrics are unavailable, and output quality is highly variable.