cstr/Spaetzle-v60-7b

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:8kPublished:Apr 14, 2024License:cc-by-nc-4.0Architecture:Transformer0.0K Open Weights Warm

cstr/Spaetzle-v60-7b is a 7 billion parameter language model, created through a progressive merge of abideen/AlphaMonarch-dora and cstr/Spaetzle-v58-7b. This model is specifically designed to offer a balanced compromise for both English and German language tasks, making it suitable for bilingual applications. It demonstrates competitive performance on various English and German benchmarks, including the Occiglot Euro LLM Leaderboard and the Low-bit Quantized Open LLM Leaderboard.

Loading preview...

Spaetzle-v60-7b: A Bilingual 7B Merged Model

Spaetzle-v60-7b is a 7 billion parameter language model developed by cstr, created through a progressive merge using the dare_ties method. It combines abideen/AlphaMonarch-dora and cstr/Spaetzle-v58-7b to achieve a suitable compromise for both English and German local tasks.

Key Capabilities & Performance

  • Bilingual Optimization: Intended for balanced performance across English and German language tasks.
  • Competitive Benchmarking: Achieves a score of 60.95 on the DE benchmark and 71.65 on the EN benchmark on the Occiglot Euro LLM Leaderboard, outperforming several other 7B models like meta-llama/Meta-Llama-3-8B and mistralai/Mistral-7B-Instruct-v0.2 in certain German metrics.
  • Quantized Performance: The int4-inc quantized version demonstrates strong performance on the Low-bit Quantized Open LLM Leaderboard with an average score of 68.01, ranking closely behind Intel/SOLAR-10.7B-Instruct-v1.0-int4-inc while being a smaller model.
  • Low Contamination: Contamination checks against Mistral instruct 7b v0.1 show very low results (e.g., MMLU < 0.1%, TruthfulQA < 0.1%), indicating minimal data overlap.

Ideal Use Cases

  • Bilingual Applications: Excellent for scenarios requiring robust performance in both English and German.
  • Resource-Constrained Environments: The 7B parameter size and strong quantized performance make it suitable for deployment where computational resources are limited.
  • General Text Generation: Capable of handling a variety of text generation tasks in both supported languages.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p