cstr/llama3-8b-spaetzle-v13
cstr/llama3-8b-spaetzle-v13 is an 8 billion parameter language model merged from Azure99/blossom-v5-llama3-8b and VAGOsolutions/Llama-3-SauerkrautLM-8b-Instruct, utilizing a context length of 8192 tokens. This model demonstrates strong performance in both German and English, achieving an EQ Bench v2_de score of 64.14 and an English EQ-Bench Score (v2) of 75.59. It is particularly well-suited for general-purpose conversational AI and tasks requiring robust language understanding in both languages.
Loading preview...
Model Overview
cstr/llama3-8b-spaetzle-v13 is an 8 billion parameter language model, created by merging two distinct Llama 3-based models: Azure99/blossom-v5-llama3-8b and VAGOsolutions/Llama-3-SauerkrautLM-8b-Instruct. This merge was performed using the dare_ties method, with VAGOsolutions/Llama-3-SauerkrautLM-8b-Instruct serving as the base model.
Key Capabilities & Performance
This model is designed for general language tasks and exhibits strong performance in both English and German. It maintains the standard Llama 3 prompt format. Key benchmark results include:
- EQ Bench v2_de: 64.14
- English EQ-Bench Score (v2): 75.59
- Average MMLU: 68.06
- HellaSwag: 85.05
- GSM8K: 67.1
The model demonstrates proficiency in arithmetic and multi-step reasoning, as shown in sample outputs for mathematical problems and complex scenarios.
Configuration Details
The merge configuration utilized a density of 0.65 and a weight of 0.4 for Azure99/blossom-v5-llama3-8b, with int8_mask enabled and dtype set to bfloat16. The tokenizer source is from the base model.
Ideal Use Cases
This model is suitable for applications requiring:
- Multilingual text generation: Particularly strong in German and English.
- Conversational AI: General chat and instruction-following tasks.
- Reasoning and problem-solving: Demonstrated ability to handle logical and mathematical queries.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.