VAGOsolutions/SauerkrautLM-Gemma-2b

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2.5BQuant:BF16Ctx Length:8kPublished:Mar 6, 2024License:gemma-terms-of-useArchitecture:Transformer0.0K Warm

VAGOsolutions/SauerkrautLM-Gemma-2b is a 2.5 billion parameter instruction-tuned model developed jointly by VAGO solutions and Hyperspace.ai, based on Google's Gemma-2b architecture. This model is specifically fine-tuned for German and English bilingual capabilities, utilizing a novel laser-QLoRA training technique to enhance performance, particularly in math abilities, while preserving prior knowledge. It offers an 8192 token context length and is optimized for general language tasks with a focus on German language skills.

Loading preview...

SauerkrautLM-Gemma-2b Overview

SauerkrautLM-Gemma-2b is an early-stage, instruction-tuned model developed by VAGO solutions and Hyperspace.ai, building upon Google's Gemma-2b. It is notable for its bilingual capabilities in German and English, making it one of the first Gemma-2b models with this specific focus.

Key Capabilities & Training Innovations

  • Bilingual Proficiency: Fine-tuned to teach German language skills, alongside its English base.
  • Novel Training Technique: Utilizes "laser-QLoRA" and "LaserRMT" approaches, which involve partially freezing the model based on a laser-like analysis. This method aims to optimize performance, particularly in areas like math, while preventing the forgetting of previously acquired knowledge.
  • Targeted Optimization: Training involved iterations focused on enhancing specific capabilities such as MMLU, TQA, GSM8K, and Winogrande, with active monitoring to improve math abilities without degrading other benchmarks.
  • Vicuna Prompt Template: Trained using the Vicuna prompt template, requiring specific stopping strings for optimal client integration.

Performance & Considerations

While still a work in progress, the model shows an average performance of 48.93 on the Open LLM Leaderboard, with specific scores like 42.06 on MMLU and 27.67 on GSM8K. It is important to note that this is an alpha release, and users should be aware that "strange behavior" and "not entirely correct" formulations may occasionally occur. The developers are actively monitoring and improving the model.