VAGOsolutions/SauerkrautLM-Gemma-7b
TEXT GENERATIONConcurrency Cost:1Model Size:8.5BQuant:FP8Ctx Length:8kPublished:Feb 27, 2024License:gemma-terms-of-useArchitecture:Transformer0.0K Cold

VAGO solutions and Hyperspace.ai present SauerkrautLM-Gemma-7b, an 8.5 billion parameter instruction-tuned model based on Google's Gemma-7b architecture with an 8192 token context length. This model is uniquely fine-tuned using a novel laser-QLoRA technique and LaserRMT, focusing on preventing catastrophic forgetting and enhancing mathematical abilities. It is notable for being one of the first Gemma models with strong bilingual capabilities in German and English, achieving an average of 67.83 on the H6 Open LLM Leaderboard and 54.13 on AGIEval, GPT4ALL, and BigBench.

Loading preview...

SauerkrautLM-Gemma-7b: A Bilingual Gemma Fine-tune

VAGO solutions and Hyperspace.ai introduce SauerkrautLM-Gemma-7b, an 8.5 billion parameter model built upon Google's Gemma-7b. This model is distinguished by its innovative training methodology, including SFT (Supervised Fine-Tuning) and DPO (Direct Preference Optimization), alongside a novel technique called laser-QLoRA and LaserRMT.

Key Capabilities & Innovations

  • Novel Training Strategy: Utilizes laser-QLoRA and LaserRMT, partially freezing the model based on laser-like analysis to optimize performance and prevent catastrophic forgetting, especially when teaching new skills or languages.
  • Bilingual Proficiency: Specifically trained to acquire German language skills, making it one of the first Gemma models to offer strong bilingual capabilities in both German and English.
  • Enhanced Mathematical Abilities: Through targeted data selection and continuous monitoring of perplexity on benchmarks like GSM8K, the model shows improved math performance without degrading other benchmark scores.
  • Strong Benchmark Performance: Achieves an average of 67.83 on the H6 Open LLM Leaderboard and 54.13 across AGIEval, GPT4ALL, and BigBench, outperforming several comparable models including zephyr-7b-beta and google/gemma-7b-it in these combined metrics.

Important Considerations

  • Early Stage Development: The model is in an early stage of fine-tuning, and users may occasionally observe "lazy" or "odd" responses, indicating ongoing refinement.
  • Prompt Template: Users should adhere to the Vicuna prompt template and include "</s>" and "</p>" as stopping strings.

This model represents a significant step in exploring advanced fine-tuning techniques for language models, particularly in maintaining broad intelligence while specializing in new domains like bilingualism and specific task performance.