Overview
anakin87/gemma-2-2b-neogenesis-ita is a 2.6 billion parameter Gemma 2 model, fine-tuned from google/gemma-2-2b-it by anakin87. It is specifically optimized for superior performance in the Italian language, while maintaining an 8K context length. The model was developed using a combination of Instruction Fine-Tuning (IFT) and Direct Preference Optimization (DPO).
Key Differentiators & Training
This model employs a unique training approach called Spectrum, which focuses parameter-efficient learning on the top 25% most informative layers of the model, freezing the rest. This technique helps preserve safety features from the base model. Training involved approximately 15 hours on a single NVIDIA A6000 GPU. The fine-tuning leveraged a diverse set of Italian and some English datasets for both IFT and DPO, including efederici/capybara-claude-15k-ita and mii-llm/argilla-math-preferences-it.
Performance
Evaluated on the Open Ita LLM Leaderboard, anakin87/gemma-2-2b-neogenesis-ita demonstrates strong performance for its size in Italian benchmarks:
- MMLU_IT: 48.03
- HELLASWAG_IT: 56.97
- Average: 48.49
These scores surpass the base google/gemma-2-2b-it model and an earlier SFT checkpoint, highlighting its effectiveness in Italian language understanding and generation.
Use Cases
This model is ideal for applications requiring strong Italian language capabilities, especially where a smaller, efficient model is preferred. Due to its parameter count, it may have limited world knowledge, which can be augmented using techniques like Retrieval-Augmented Generation (RAG).