cstr/Spaetzle-v8-7b

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Mar 10, 2024Architecture:Transformer0.0K Cold

The cstr/Spaetzle-v8-7b is a 7 billion parameter merged language model, built upon mayflowergmbh/Wiedervereinigung-7b-dpo-laser and incorporating flemmingmiguel/NeuDist-Ro-7B, johannhartmann/Brezn3, and ResplendentAI/Flora_DPO_7B. This model is designed for adequate performance in both German and English tasks, focusing on instruction following and reasoning while maintaining consistent behavior. It offers a 4096 token context length and is suitable for use cases where robust instruction following is prioritized over perfect German grammar and orthography.

Loading preview...

Spaetzle-v8-7b: A Merged Bilingual Model

cstr/Spaetzle-v8-7b is a 7 billion parameter language model created by merging several existing models, including flemmingmiguel/NeuDist-Ro-7B, johannhartmann/Brezn3, and ResplendentAI/Flora_DPO_7B, based on mayflowergmbh/Wiedervereinigung-7b-dpo-laser. This model aims to provide solid performance in both German and English, with a particular focus on reliable instruction following and reasoning capabilities.

Key Characteristics & Performance

  • Bilingual Focus: Designed for adequate performance in both German and English tasks.
  • Behavioral Consistency: Prioritizes consistent output and avoids issues like rambling or template intermixing.
  • Instruction Following: Shows a preference for instruction following and reasoning over strict German grammatical perfection.
  • Evaluation Scores: Achieves an average of 72.27 on the Open LLM Leaderboard, with specific scores like 68.69 on AI2 Reasoning Challenge and 64.60 on MMLU. It also scores 61.04 on EQ-Bench (v2_de) and 78.3 on EQ-Bench (v2_english).
  • Context Length: Supports a context length of 4096 tokens.

Use Cases & Considerations

  • Good for: Scenarios where robust instruction following and reasoning are critical, and minor imperfections in German grammar or orthography are acceptable.
  • Not ideal for: Applications requiring highly precise and grammatically perfect German text generation, where models like DiscoLM might be stronger.
  • Configuration: Utilizes the ChatML format for prompts, making it compatible with standard chat interfaces.