SauerkrautLM-Gemma-7b: A Bilingual Gemma Fine-tune

VAGO solutions and Hyperspace.ai introduce SauerkrautLM-Gemma-7b, an 8.5 billion parameter model built upon Google's Gemma-7b. This model is distinguished by its innovative training methodology, including SFT (Supervised Fine-Tuning) and DPO (Direct Preference Optimization), alongside a novel technique called laser-QLoRA and LaserRMT.

Key Capabilities & Innovations

Novel Training Strategy: Utilizes laser-QLoRA and LaserRMT, partially freezing the model based on laser-like analysis to optimize performance and prevent catastrophic forgetting, especially when teaching new skills or languages.
Bilingual Proficiency: Specifically trained to acquire German language skills, making it one of the first Gemma models to offer strong bilingual capabilities in both German and English.
Enhanced Mathematical Abilities: Through targeted data selection and continuous monitoring of perplexity on benchmarks like GSM8K, the model shows improved math performance without degrading other benchmark scores.
Strong Benchmark Performance: Achieves an average of 67.83 on the H6 Open LLM Leaderboard and 54.13 across AGIEval, GPT4ALL, and BigBench, outperforming several comparable models including zephyr-7b-beta and google/gemma-7b-it in these combined metrics.

Important Considerations

Early Stage Development: The model is in an early stage of fine-tuning, and users may occasionally observe "lazy" or "odd" responses, indicating ongoing refinement.
Prompt Template: Users should adhere to the Vicuna prompt template and include "</s>" and "</p>" as stopping strings.

This model represents a significant step in exploring advanced fine-tuning techniques for language models, particularly in maintaining broad intelligence while specializing in new domains like bilingualism and specific task performance.

Overview

SauerkrautLM-Gemma-7b: A Bilingual Gemma Fine-tune

Key Capabilities & Innovations

Important Considerations

Full Model Card (README)