Name: trl-lib/Qwen2-0.5B-ORPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: trl-lib

Overview

trl-lib/Qwen2-0.5B-ORPO is a 0.5 billion parameter language model, fine-tuned from the base Qwen/Qwen2-0.5B-Instruct model. It was developed by trl-lib and trained using the TRL (Transformer Reinforcement Learning) framework. A key differentiator for this model is its training methodology: it employs ORPO (Monolithic Preference Optimization without Reference Model), a novel approach that optimizes preferences without requiring a separate reference model. The training utilized the trl-lib/ultrafeedback_binarized dataset, making it suitable for tasks requiring alignment with human feedback.

Key Capabilities

Preference Optimization: Trained with ORPO, it excels at generating responses that align with specified preferences.
Efficient Fine-tuning: Leverages the TRL library for effective and streamlined fine-tuning processes.
Compact Size: At 0.5 billion parameters, it offers a lightweight solution for preference-aligned text generation.
Large Context Window: Inherits a substantial context length of 131072 tokens, allowing for processing extensive inputs.

Good for

Applications requiring models optimized for human preferences.
Scenarios where a smaller, efficient model with a large context window is beneficial.
Research and development in preference optimization techniques, particularly ORPO.
Generating high-quality, aligned text in resource-constrained environments.

Overview

Overview

Key Capabilities

Good for

Full Model Card (README)