cstr/llama3-8b-spaetzle-v13

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kLicense:llama3Architecture:Transformer Warm

cstr/llama3-8b-spaetzle-v13 is an 8 billion parameter language model merged from Azure99/blossom-v5-llama3-8b and VAGOsolutions/Llama-3-SauerkrautLM-8b-Instruct, utilizing a context length of 8192 tokens. This model demonstrates strong performance in both German and English, achieving an EQ Bench v2_de score of 64.14 and an English EQ-Bench Score (v2) of 75.59. It is particularly well-suited for general-purpose conversational AI and tasks requiring robust language understanding in both languages.

Loading preview...

Model Overview

cstr/llama3-8b-spaetzle-v13 is an 8 billion parameter language model, created by merging two distinct Llama 3-based models: Azure99/blossom-v5-llama3-8b and VAGOsolutions/Llama-3-SauerkrautLM-8b-Instruct. This merge was performed using the dare_ties method, with VAGOsolutions/Llama-3-SauerkrautLM-8b-Instruct serving as the base model.

Key Capabilities & Performance

This model is designed for general language tasks and exhibits strong performance in both English and German. It maintains the standard Llama 3 prompt format. Key benchmark results include:

  • EQ Bench v2_de: 64.14
  • English EQ-Bench Score (v2): 75.59
  • Average MMLU: 68.06
  • HellaSwag: 85.05
  • GSM8K: 67.1

The model demonstrates proficiency in arithmetic and multi-step reasoning, as shown in sample outputs for mathematical problems and complex scenarios.

Configuration Details

The merge configuration utilized a density of 0.65 and a weight of 0.4 for Azure99/blossom-v5-llama3-8b, with int8_mask enabled and dtype set to bfloat16. The tokenizer source is from the base model.

Ideal Use Cases

This model is suitable for applications requiring:

  • Multilingual text generation: Particularly strong in German and English.
  • Conversational AI: General chat and instruction-following tasks.
  • Reasoning and problem-solving: Demonstrated ability to handle logical and mathematical queries.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p