mlabonne/OrpoLlama-3-8B

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Apr 18, 2024License:otherArchitecture:Transformer0.1K Cold

OrpoLlama-3-8B is an 8 billion parameter language model developed by mlabonne, fine-tuned from Meta-Llama-3-8B using the ORPO method on the mlabonne/orpo-dpo-mix-40k dataset. This model utilizes a context window of 8192 tokens and follows the ChatML template. It demonstrates improved performance over Llama-3-8B-Instruct on specific benchmarks like GPT4All and TruthfulQA, making it suitable for general conversational AI and question-answering tasks.

Loading preview...

Overview

mlabonne/OrpoLlama-3-8B is an 8 billion parameter language model, fine-tuned by mlabonne from the base meta-llama/Meta-Llama-3-8B model. The fine-tuning process utilized the ORPO (Optimized Reward Policy Optimization) method on the mlabonne/orpo-dpo-mix-40k dataset, as detailed in this article.

Key Capabilities and Features

  • Architecture: Based on the Llama 3 family, providing a strong foundation for general language understanding and generation.
  • Context Window: Supports an 8k token context window, allowing for processing longer inputs and generating more coherent, extended responses.
  • ChatML Template: Trained to follow the ChatML template, ensuring compatibility and optimal performance with chat-based applications.
  • Quantized Versions: Various quantized versions are available for efficient deployment, including GGUF, AWQ, and EXL2 formats, provided by community contributors like bartowski, solidrust, and LoneStriker.

Performance Highlights

Evaluations using LLM AutoEval indicate that OrpoLlama-3-8B outperforms its base model, Llama-3-8B-Instruct, on specific datasets:

  • GPT4All: Achieves 70.59%, surpassing Llama-3-8B-Instruct's 69.86%.
  • TruthfulQA: Scores 52.39%, compared to Llama-3-8B-Instruct's 51.65%.

While its overall average score on the Nous benchmark is slightly lower than Llama-3-8B-Instruct, its strengths in specific areas make it a compelling choice for tasks requiring factual accuracy and general knowledge. Training curves and detailed experiment logs are available on W&B.