SauerkrautLM-1.5b Overview
SauerkrautLM-1.5b is a 1.5 billion parameter language model developed by VAGO solutions, built upon the Qwen/Qwen2-1.5B architecture. Its primary differentiator is the use of Spectrum Continuous Pre-Training (CPT) on German data, targeting only 25% of the model's layers. This innovative approach significantly reduces training resource consumption while effectively enhancing German language proficiency.
Key Capabilities & Training Insights
- Resource-Efficient Multilingualism: Achieves substantial improvements in German language skills with a fraction of the resources typically required for full CPT. Training on 6.1 billion German tokens cost $1152 in GPU-rent for CPT.
- Performance: In German RAG evaluations, it performs comparably to 8 billion parameter models. It also maintains or surpasses the performance of the base Qwen2-1.5B-Instruct model in some English benchmarks.
- Mobile Deployment: Its compact 1.5 billion parameter size makes it well-suited for deployment on smartphones and tablets.
- Training Process: After CPT, the model underwent 3 epochs of Supervised Fine-Tuning (SFT) with 700K samples and was further aligned with Direct Preference Optimization (DPO) using 70K samples.
Why Use SauerkrautLM-1.5b?
- German Language Applications: Ideal for use cases requiring strong German language understanding and generation, especially where resource efficiency is critical.
- Edge Device Deployment: Its small size and optimized performance make it a strong candidate for on-device AI applications.
- Demonstration of Efficient Training: Serves as a practical example of how advanced CPT techniques can efficiently adapt LLMs to new languages without significant performance degradation in their original language.