Name: DiscoResearch/Llama3-German-8B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: DiscoResearch

Overview

DiscoResearch/Llama3-German-8B is an 8 billion parameter large language model, a specialized version of Meta's Llama3-8B, continuously pretrained on 65 billion high-quality German tokens. This effort, a collaboration between DiscoResearch and Occiglot with support from DFKI and hessian.Ai, addresses Llama3's suboptimal German performance due to its limited multilingual training data.

Key Capabilities

Enhanced German Performance: Demonstrates strong improvements in German linguistic understanding and general reasoning, particularly on the Hellaswag benchmark, compared to the base Llama3-8B.
Minimal English Degradation: Achieves German specialization with minimal impact on English performance, despite the absence of replay during training.
Efficient Training: Utilizes a novel document packing strategy based on "Fewer Truncations Improve Language Modeling" for higher packing efficiency and improved benchmark scores.
Base Model: Designed as a foundational model, suitable for further fine-tuning to specific German language tasks.

Model Configurations

DiscoResearch offers several configurations:

Base model (this one) with continued pretraining.
Long-context version (32k context length).
Instruction-tuned versions of both base and long-context models.
Experimental DARE-TIES Merge with Llama3-Instruct.
Collection of Quantized versions.

Good For

Developers and researchers requiring a high-performance German-centric large language model.
Applications demanding strong linguistic understanding and reasoning in German.
As a base for fine-tuning custom German-specific LLM applications.

Overview

Overview

Key Capabilities

Model Configurations

Good For

Full Model Card (README)