Model Overview
VAGOsolutions/SauerkrautLM-gemma-2-2b-it is a 2.6 billion parameter instruction-tuned model developed by VAGO solutions. It is based on the google/gemma-2-2b-it architecture and has been fine-tuned using a novel approach called Spectrum Fine-Tuning.
Key Characteristics & Training
- Resource-Efficient Fine-Tuning: The model showcases the potential of Spectrum Fine-Tuning by targeting only 25% of the model's layers, significantly reducing resource consumption compared to traditional fine-tuning methods.
- Multilingual Focus: Fine-tuned on a proprietary "German-English Sauerkraut Mix v2" dataset, which includes meticulously selected high-quality data and cutting-edge synthetic datasets.
- Enhanced Capabilities: The fine-tuning process has led to significant improvements in instruction-following, common-sense reasoning, and problem-solving. It also demonstrates enhanced multilingual performance, not only in German and English but across various other languages, as indicated by MMLU evaluations.
Objective
The primary objective of this model's development was to demonstrate that even a small 2 billion parameter model can achieve enhanced capabilities through resource-efficient fine-tuning, while preserving the majority of its pre-existing knowledge.
Evaluation Highlights
Evaluations using benchmarks like AGIEVAL, GPT4ALL, TRUTHFULQA, OPENLEADERBOARD 2, and MMLU 5-shot indicate improved performance, particularly in multilingual contexts. VAGO solutions notes that their absolute benchmark results may differ from Hugging Face Leaderboard due to varying evaluation setups, but relative differences remain consistent.