cstr/Spaetzle-v12-7b

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Mar 11, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

cstr/Spaetzle-v12-7b is a 7 billion parameter language model merged from flemmingmiguel/NeuDist-Ro-7B, Blizado/discolm-mfto-7b-german-v0.1, and ResplendentAI/Flora_DPO_7B, based on mayflowergmbh/Wiedervereinigung-7b-dpo-laser, using the LazyMergekit. This model is specifically optimized for German language tasks, showing a slight improvement over its predecessor, Spaetzle-v8-7b, in German benchmarks while maintaining a 4096-token context length. It achieves an EQ-Bench (de) score of 64.81, making it suitable for applications requiring strong German language understanding and generation.

Loading preview...

Spaetzle-v12-7b: A German-Optimized 7B Merge Model

cstr/Spaetzle-v12-7b is a 7 billion parameter language model created through a merge of several specialized models using LazyMergekit. It builds upon mayflowergmbh/Wiedervereinigung-7b-dpo-laser and integrates flemmingmiguel/NeuDist-Ro-7B, Blizado/discolm-mfto-7b-german-v0.1, and ResplendentAI/Flora_DPO_7B.

Key Characteristics & Performance

This iteration of the Spaetzle series demonstrates a targeted improvement in German language tasks. While it shows a slight decrease in general English performance compared to its predecessor, Spaetzle-v8-7b, it achieves a notable EQ-Bench (de) score of 64.81, indicating enhanced capabilities for German-specific applications. The model's overall average performance across various benchmarks, including AGIEval, GPT4All, TruthfulQA, and Bigbench, is 54.95%.

Merge Configuration

The model was constructed using a dare_ties merge method with specific density and weight parameters for each contributing model, and int8_mask enabled for the base model. The tokenizer source is derived from the base model.

Ideal Use Cases

  • German Language Processing: Excels in tasks requiring strong understanding and generation in German.
  • Multilingual Applications: Suitable for scenarios where German language proficiency is a priority, even if general English performance is slightly lower than some alternatives.
  • Research and Development: Provides a foundation for further fine-tuning or experimentation with merged models focused on specific linguistic domains.