DiscoResearch/Llama3-DiscoLeo-Instruct-8B-32k-v0.1

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:May 24, 2024License:llama3Architecture:Transformer0.0K Warm

DiscoResearch/Llama3-DiscoLeo-Instruct-8B-32k-v0.1 is an 8 billion parameter instruction-tuned causal language model developed by DiscoResearch and Occiglot, with support from DFKI and hessian.Ai. Derived from Meta's Llama3-8B, it was continuously pretrained on 65 billion high-quality German tokens and further trained on 100 million tokens for a 32k context length. This model is specifically fine-tuned on a German instruction dataset, making it highly effective for German language tasks and long-context applications.

Loading preview...

Model Overview

DiscoResearch/Llama3-DiscoLeo-Instruct-8B-32k-v0.1 is an instruction-tuned variant of the Llama3-8B architecture, developed through a collaboration between DiscoResearch and Occiglot, with contributions from DFKI and hessian.Ai. It builds upon a base model continuously pretrained on 65 billion high-quality German tokens, similar to LeoLM and Occiglot models. A key feature is its extended context window, achieved by training on an additional 100 million tokens at a 32k context length, utilizing a rope_theta value of 1.5e6.

Key Capabilities

  • German Language Proficiency: Continuously pretrained on a vast corpus of German tokens, making it highly capable for German-centric tasks.
  • Extended Context Window: Supports a 32k context length, enabling processing and generation of longer texts.
  • Instruction Following: Fine-tuned on a dedicated German instruction dataset, enhancing its ability to understand and execute complex instructions.
  • Llama-3 Chat Template: Utilizes the standard Llama-3 chat template for easy integration with transformers chat templating.

Performance

Evaluated against common English and German benchmarks (GermanBench), the model demonstrates strong performance, particularly in German-specific metrics. It achieves a mean score of 0.60547 across various benchmarks, showing competitive results against Meta's Llama-3-8B-Instruct and other DiscoResearch models.

Good For

  • Applications requiring robust German language understanding and generation.
  • Tasks benefiting from a large context window, such as summarizing long documents or engaging in extended conversations in German.
  • Instruction-following tasks where precise responses to German prompts are critical.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p