DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:May 23, 2024License:llama3Architecture:Transformer0.0K Warm

DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1 is an 8 billion parameter instruction-tuned language model developed by DiscoResearch and Occiglot, with support from DFKI and hessian.Ai. Derived from Meta's Llama3-8B, it underwent continuous pretraining on 65 billion high-quality German tokens and was fine-tuned on a German instruction dataset. This model excels in German language understanding and generation, offering strong performance across various German benchmarks with an 8192 token context length.

Loading preview...

Llama3-DiscoLeo-Instruct-8B-v0.1 Overview

This model is an 8 billion parameter instruction-tuned variant, a collaborative effort by DiscoResearch and Occiglot, supported by DFKI and hessian.Ai. It builds upon Meta's Llama3-8B, having undergone extensive continuous pretraining on 65 billion high-quality German tokens, similar to established LeoLM and Occiglot models. The instruction tuning phase utilized a dedicated German instruction dataset developed by DiscoResearch.

Key Capabilities & Features

  • Optimized for German Language: Continuously pretrained on a massive German token dataset, making it highly proficient in German understanding and generation.
  • Instruction-Tuned: Fine-tuned on a specific German instruction dataset for improved conversational and task-oriented performance.
  • Llama-3 Chat Template: Utilizes the standard Llama-3 chat template, ensuring compatibility and ease of use with transformers library's chat templating.
  • Strong German Benchmark Performance: Achieves a mean score of 0.60552 across a suite of German and English benchmarks, outperforming Meta-Llama-3-8B-Instruct in several German-specific evaluations like truthful_qa_de and arc_challenge_de.
  • 8192 Token Context Length: Supports a substantial context window for processing longer inputs and generating more coherent responses.

When to Use This Model

  • German Language Applications: Ideal for chatbots, content generation, summarization, and question-answering systems requiring high proficiency in German.
  • Instruction-Following Tasks: Suited for applications where the model needs to adhere to specific instructions and generate structured outputs.
  • Research and Development: A valuable resource for researchers focusing on German NLP and evaluating instruction-tuned models in a multilingual context.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p