trl-lib/Qwen2-0.5B-ORPO
TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:Oct 11, 2024Architecture:Transformer0.0K Warm

The trl-lib/Qwen2-0.5B-ORPO model is a 0.5 billion parameter language model developed by trl-lib, fine-tuned from Qwen/Qwen2-0.5B-Instruct. It utilizes the ORPO (Monolithic Preference Optimization without Reference Model) method and was trained on the ultrafeedback_binarized dataset using the TRL framework. This model is designed for preference optimization tasks, offering a compact solution for generating responses aligned with human preferences, and supports a substantial context length of 131072 tokens.

Loading preview...