DiscoResearch/Llama3-DiscoLeo-Instruct-8B-32k-v0.1
DiscoResearch/Llama3-DiscoLeo-Instruct-8B-32k-v0.1 is an 8 billion parameter instruction-tuned causal language model developed by DiscoResearch and Occiglot, with support from DFKI and hessian.Ai. Derived from Meta's Llama3-8B, it was continuously pretrained on 65 billion high-quality German tokens and further trained on 100 million tokens for a 32k context length. This model is specifically fine-tuned on a German instruction dataset, making it highly effective for German language tasks and long-context applications.
Loading preview...
Model Overview
DiscoResearch/Llama3-DiscoLeo-Instruct-8B-32k-v0.1 is an instruction-tuned variant of the Llama3-8B architecture, developed through a collaboration between DiscoResearch and Occiglot, with contributions from DFKI and hessian.Ai. It builds upon a base model continuously pretrained on 65 billion high-quality German tokens, similar to LeoLM and Occiglot models. A key feature is its extended context window, achieved by training on an additional 100 million tokens at a 32k context length, utilizing a rope_theta value of 1.5e6.
Key Capabilities
- German Language Proficiency: Continuously pretrained on a vast corpus of German tokens, making it highly capable for German-centric tasks.
- Extended Context Window: Supports a 32k context length, enabling processing and generation of longer texts.
- Instruction Following: Fine-tuned on a dedicated German instruction dataset, enhancing its ability to understand and execute complex instructions.
- Llama-3 Chat Template: Utilizes the standard Llama-3 chat template for easy integration with
transformerschat templating.
Performance
Evaluated against common English and German benchmarks (GermanBench), the model demonstrates strong performance, particularly in German-specific metrics. It achieves a mean score of 0.60547 across various benchmarks, showing competitive results against Meta's Llama-3-8B-Instruct and other DiscoResearch models.
Good For
- Applications requiring robust German language understanding and generation.
- Tasks benefiting from a large context window, such as summarizing long documents or engaging in extended conversations in German.
- Instruction-following tasks where precise responses to German prompts are critical.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.