SauerkrautLM-Gemma-2b Overview

SauerkrautLM-Gemma-2b is an early-stage, instruction-tuned model developed by VAGO solutions and Hyperspace.ai, building upon Google's Gemma-2b. It is notable for its bilingual capabilities in German and English, making it one of the first Gemma-2b models with this specific focus.

Key Capabilities & Training Innovations

Bilingual Proficiency: Fine-tuned to teach German language skills, alongside its English base.
Novel Training Technique: Utilizes "laser-QLoRA" and "LaserRMT" approaches, which involve partially freezing the model based on a laser-like analysis. This method aims to optimize performance, particularly in areas like math, while preventing the forgetting of previously acquired knowledge.
Targeted Optimization: Training involved iterations focused on enhancing specific capabilities such as MMLU, TQA, GSM8K, and Winogrande, with active monitoring to improve math abilities without degrading other benchmarks.
Vicuna Prompt Template: Trained using the Vicuna prompt template, requiring specific stopping strings for optimal client integration.

Performance & Considerations

While still a work in progress, the model shows an average performance of 48.93 on the Open LLM Leaderboard, with specific scores like 42.06 on MMLU and 27.67 on GSM8K. It is important to note that this is an alpha release, and users should be aware that "strange behavior" and "not entirely correct" formulations may occasionally occur. The developers are actively monitoring and improving the model.

Overview

SauerkrautLM-Gemma-2b Overview

Key Capabilities & Training Innovations

Performance & Considerations

Full Model Card (README)