VAGOsolutions/SauerkrautLM-Qwen-32b

Cold
Public
32.5B
FP8
32768
License: tongyi-qianwen-research
Hugging Face
Overview

SauerkrautLM-Qwen-32b Overview

VAGOsolutions/SauerkrautLM-Qwen-32b is a 32.5 billion parameter language model, a collaborative effort between VAGO solutions and Hyperspace.ai. It is built upon the robust Qwen/Qwen1.5-32B architecture, undergoing a specialized fine-tuning process to enhance its capabilities.

Key Capabilities & Training

  • Bilingual Proficiency: This model is uniquely fine-tuned to excel in both German and English, making it a pioneering Qwen 32B model with strong dual-language skills. The training specifically focused on teaching German language nuances.
  • Fine-tuning & Alignment: The model was initially fine-tuned using Supervised Fine-Tuning (SFT) over 2 epochs on 160,000 data samples. Subsequently, it underwent an alignment phase using Direct Preference Optimization (DPO) for 1 epoch with 110,000 data points.
  • Performance: On the Open LLM Leaderboard, SauerkrautLM-Qwen-32b achieved an average score of 73.11. Notable scores include 74.40 on MMLU (5-shot) and 79.53 on GSM8K (5-shot).

Use Cases

  • Bilingual Applications: Ideal for applications requiring high-quality text generation and understanding in both German and English.
  • Research & Development: Provides a strong base for further research into bilingual LLMs, particularly for German-English contexts.

While the model demonstrates strong bilingual capabilities, the developers note that some German formulations may still be refined as it is an ongoing project.