DiscoResearch/Llama3-German-8B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:May 23, 2024License:llama3Architecture:Transformer0.0K Warm

DiscoResearch/Llama3-German-8B is an 8 billion parameter large language model based on Meta's Llama3-8B, continuously pretrained on 65 billion high-quality German tokens. Developed by DiscoResearch and Occiglot, it significantly improves German linguistic capabilities and reasoning, particularly on the Hellaswag benchmark, while maintaining English performance. This model is specialized for German language tasks, addressing Llama3's suboptimal German performance due to limited multilingual training data. It is intended as a base model for further fine-tuning for German-specific applications.

Loading preview...

Overview

DiscoResearch/Llama3-German-8B is an 8 billion parameter large language model, a specialized version of Meta's Llama3-8B, continuously pretrained on 65 billion high-quality German tokens. This effort, a collaboration between DiscoResearch and Occiglot with support from DFKI and hessian.Ai, addresses Llama3's suboptimal German performance due to its limited multilingual training data.

Key Capabilities

  • Enhanced German Performance: Demonstrates strong improvements in German linguistic understanding and general reasoning, particularly on the Hellaswag benchmark, compared to the base Llama3-8B.
  • Minimal English Degradation: Achieves German specialization with minimal impact on English performance, despite the absence of replay during training.
  • Efficient Training: Utilizes a novel document packing strategy based on "Fewer Truncations Improve Language Modeling" for higher packing efficiency and improved benchmark scores.
  • Base Model: Designed as a foundational model, suitable for further fine-tuning to specific German language tasks.

Model Configurations

DiscoResearch offers several configurations:

  • Base model (this one) with continued pretraining.
  • Long-context version (32k context length).
  • Instruction-tuned versions of both base and long-context models.
  • Experimental DARE-TIES Merge with Llama3-Instruct.
  • Collection of Quantized versions.

Good For

  • Developers and researchers requiring a high-performance German-centric large language model.
  • Applications demanding strong linguistic understanding and reasoning in German.
  • As a base for fine-tuning custom German-specific LLM applications.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p