trl-lib/Qwen2-0.5B-ORPO

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:Oct 11, 2024Architecture:Transformer0.0K Warm

The trl-lib/Qwen2-0.5B-ORPO model is a 0.5 billion parameter language model developed by trl-lib, fine-tuned from Qwen/Qwen2-0.5B-Instruct. It utilizes the ORPO (Monolithic Preference Optimization without Reference Model) method and was trained on the ultrafeedback_binarized dataset using the TRL framework. This model is designed for preference optimization tasks, offering a compact solution for generating responses aligned with human preferences, and supports a substantial context length of 131072 tokens.

Loading preview...

Overview

trl-lib/Qwen2-0.5B-ORPO is a 0.5 billion parameter language model, fine-tuned from the base Qwen/Qwen2-0.5B-Instruct model. It was developed by trl-lib and trained using the TRL (Transformer Reinforcement Learning) framework. A key differentiator for this model is its training methodology: it employs ORPO (Monolithic Preference Optimization without Reference Model), a novel approach that optimizes preferences without requiring a separate reference model. The training utilized the trl-lib/ultrafeedback_binarized dataset, making it suitable for tasks requiring alignment with human feedback.

Key Capabilities

  • Preference Optimization: Trained with ORPO, it excels at generating responses that align with specified preferences.
  • Efficient Fine-tuning: Leverages the TRL library for effective and streamlined fine-tuning processes.
  • Compact Size: At 0.5 billion parameters, it offers a lightweight solution for preference-aligned text generation.
  • Large Context Window: Inherits a substantial context length of 131072 tokens, allowing for processing extensive inputs.

Good for

  • Applications requiring models optimized for human preferences.
  • Scenarios where a smaller, efficient model with a large context window is beneficial.
  • Research and development in preference optimization techniques, particularly ORPO.
  • Generating high-quality, aligned text in resource-constrained environments.