cstr/Spaetzle-v12-7b
cstr/Spaetzle-v12-7b is a 7 billion parameter language model merged from flemmingmiguel/NeuDist-Ro-7B, Blizado/discolm-mfto-7b-german-v0.1, and ResplendentAI/Flora_DPO_7B, based on mayflowergmbh/Wiedervereinigung-7b-dpo-laser, using the LazyMergekit. This model is specifically optimized for German language tasks, showing a slight improvement over its predecessor, Spaetzle-v8-7b, in German benchmarks while maintaining a 4096-token context length. It achieves an EQ-Bench (de) score of 64.81, making it suitable for applications requiring strong German language understanding and generation.
Loading preview...
Spaetzle-v12-7b: A German-Optimized 7B Merge Model
cstr/Spaetzle-v12-7b is a 7 billion parameter language model created through a merge of several specialized models using LazyMergekit. It builds upon mayflowergmbh/Wiedervereinigung-7b-dpo-laser and integrates flemmingmiguel/NeuDist-Ro-7B, Blizado/discolm-mfto-7b-german-v0.1, and ResplendentAI/Flora_DPO_7B.
Key Characteristics & Performance
This iteration of the Spaetzle series demonstrates a targeted improvement in German language tasks. While it shows a slight decrease in general English performance compared to its predecessor, Spaetzle-v8-7b, it achieves a notable EQ-Bench (de) score of 64.81, indicating enhanced capabilities for German-specific applications. The model's overall average performance across various benchmarks, including AGIEval, GPT4All, TruthfulQA, and Bigbench, is 54.95%.
Merge Configuration
The model was constructed using a dare_ties merge method with specific density and weight parameters for each contributing model, and int8_mask enabled for the base model. The tokenizer source is derived from the base model.
Ideal Use Cases
- German Language Processing: Excels in tasks requiring strong understanding and generation in German.
- Multilingual Applications: Suitable for scenarios where German language proficiency is a priority, even if general English performance is slightly lower than some alternatives.
- Research and Development: Provides a foundation for further fine-tuning or experimentation with merged models focused on specific linguistic domains.