anakin87/gemma-2-2b-neogenesis-ita

Warm
Public
2.6B
BF16
8192
License: gemma
Hugging Face
Overview

Overview

anakin87/gemma-2-2b-neogenesis-ita is a 2.6 billion parameter Gemma 2 model, fine-tuned from google/gemma-2-2b-it by anakin87. It is specifically optimized for superior performance in the Italian language, while maintaining an 8K context length. The model was developed using a combination of Instruction Fine-Tuning (IFT) and Direct Preference Optimization (DPO).

Key Differentiators & Training

This model employs a unique training approach called Spectrum, which focuses parameter-efficient learning on the top 25% most informative layers of the model, freezing the rest. This technique helps preserve safety features from the base model. Training involved approximately 15 hours on a single NVIDIA A6000 GPU. The fine-tuning leveraged a diverse set of Italian and some English datasets for both IFT and DPO, including efederici/capybara-claude-15k-ita and mii-llm/argilla-math-preferences-it.

Performance

Evaluated on the Open Ita LLM Leaderboard, anakin87/gemma-2-2b-neogenesis-ita demonstrates strong performance for its size in Italian benchmarks:

  • MMLU_IT: 48.03
  • HELLASWAG_IT: 56.97
  • Average: 48.49

These scores surpass the base google/gemma-2-2b-it model and an earlier SFT checkpoint, highlighting its effectiveness in Italian language understanding and generation.

Use Cases

This model is ideal for applications requiring strong Italian language capabilities, especially where a smaller, efficient model is preferred. Due to its parameter count, it may have limited world knowledge, which can be augmented using techniques like Retrieval-Augmented Generation (RAG).