heipah/TwinLlama-3.1-8B-DPO

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Mar 29, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

TwinLlama-3.1-8B-DPO by heipah is an 8 billion parameter Llama-based causal language model, fine-tuned using Direct Preference Optimization (DPO). This model was trained significantly faster with Unsloth and Huggingface's TRL library, making it an efficient choice for applications requiring a performant Llama variant. It is designed for general language understanding and generation tasks, leveraging its optimized training process for enhanced performance.

Loading preview...

TwinLlama-3.1-8B-DPO Overview

TwinLlama-3.1-8B-DPO is an 8 billion parameter language model developed by heipah. It is a fine-tuned variant of the heipah/TwinLlama-3.1-8B base model, utilizing Direct Preference Optimization (DPO) for its training methodology. A key differentiator of this model is its training efficiency, having been trained approximately two times faster through the integration of Unsloth and Huggingface's TRL library.

Key Characteristics

  • Model Architecture: Llama-based, 8 billion parameters.
  • Training Method: Fine-tuned using Direct Preference Optimization (DPO).
  • Training Efficiency: Achieved 2x faster training speeds with Unsloth and Huggingface TRL.
  • License: Distributed under the Apache-2.0 license.

Good For

  • Applications requiring a performant and efficiently trained Llama-based model.
  • General language understanding and generation tasks where optimized training is beneficial.
  • Developers looking for a Llama 3.1 variant with a focus on training speed and DPO fine-tuning.