SauerkrautLM-v2-14b-SFT: Advanced Fine-Tuning for Enhanced Performance
VAGO solutions introduces SauerkrautLM-v2-14b-SFT, a 14.8 billion parameter instruction-tuned model built upon the Qwen/Qwen2.5-14B architecture. This release marks a significant advancement in fine-tuning methodology, employing a novel two-phase Spectrum Fine-Tuning approach to achieve superior performance.
Key Capabilities and Training:
- Two-Phase Spectrum Fine-Tuning: The model undergoes two distinct training phases, each targeting specific layers and data types. Phase 1 (25% layer targeting) and Phase 2 (20% layer targeting) each utilize 0.6 billion tokens.
- Enhanced Mathematical Reasoning: Training includes carefully curated mathematics data, selected using a proprietary classification model, leading to improved mathematical capabilities.
- Robust Function Calling: Specialized function calling data is integrated across both training phases, boosting the model's proficiency in this area.
- Multilingual Performance: The model is trained on high-quality German and English data from both Sauerkraut-v1 and Sauerkraut-v2, ensuring strong performance in both languages.
- Instruction Following & Common-Sense Reasoning: Significant improvements are noted in the model's ability to follow instructions and apply common-sense reasoning.
Evaluation Highlights:
Evaluations across various benchmarks, including AGIEVAL, GPT4ALL, TRUTHFULQA, OPENLEADERBOARD 2, MMLU 5-shot, and the Berkeley Function Calling Leaderboard, demonstrate the model's advancements. On the Open LLM Leaderboard, it achieves an average score of 35.65, with notable results in IFEval (69.64) and MMLU-PRO (46.73).
Good for:
- Applications requiring strong mathematical problem-solving.
- Scenarios needing reliable function calling capabilities.
- Use cases demanding high-quality German and English language processing.
- Tasks benefiting from improved instruction following and common-sense reasoning.