VAGOsolutions/SauerkrautLM-7b-LaserChat

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Feb 5, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

VAGOsolutions/SauerkrautLM-7b-LaserChat is a 7 billion parameter language model developed by VAGO solutions and Hyperspace.ai, fine-tuned from openchat/openchat-3.5-0106 with a 4096-token context length. It utilizes a novel training technique called LaserRMT, which involves partially freezing the model based on laser-like analysis and Spherical Linear Interpolation to prevent catastrophic forgetting and enhance specific skills. This model is optimized for improved performance in both German and English, particularly in mathematical abilities, while maintaining general intelligence.

Loading preview...

SauerkrautLM-7b-LaserChat: A Novel Approach to LLM Training

VAGOsolutions/SauerkrautLM-7b-LaserChat is a 7 billion parameter model, a collaborative effort between VAGO solutions and Hyperspace.ai, built upon the strong foundation of openchat/openchat-3.5-0106. This model distinguishes itself through a novel training strategy that incorporates LaserRMT (Laser-like analysis and Spherical Linear Interpolation) to address common fine-tuning challenges.

Key Training Innovations

  • LaserRMT Integration: The model employs a unique method of partially freezing its parameters based on a "laser-like analysis" during training. This technique, developed by the LaserRMT research group, aims to optimize the "no free lunch theorem" and prevent catastrophic forgetting, especially when teaching new skills or languages.
  • Targeted Skill Enhancement: The training procedure involved developing multiple iterations using subsets of the SFT and DPO datasets. This allowed for focused improvements in areas like MMLU, TQA, GSM8K, and Winogrande, leading to enhanced performance, particularly in math abilities, without degrading other benchmarks.
  • Multilingual Focus: While primarily English-based, the model has undergone specific improvements to its German language skills.

Performance Highlights

Evaluations on the Open LLM Leaderboard show competitive performance:

  • Avg.: 70.32
  • GSM8K (5-shot): 68.84
  • MMLU (5-shot): 64.93

When to Consider This Model

SauerkrautLM-7b-LaserChat is particularly suitable for use cases requiring a 7B parameter model with strong general capabilities, enhanced mathematical reasoning, and improved German language understanding. Its innovative training methodology makes it a robust choice for applications where preventing knowledge degradation during fine-tuning is critical.