qgallouedec/online-dpo-qwen2-2
TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kArchitecture:Transformer Warm

The qgallouedec/online-dpo-qwen2-2 is a 0.5 billion parameter causal language model, fine-tuned from Qwen/Qwen2-0.5B-Instruct. This model was specifically optimized using the trl-lib/ultrafeedback-prompt dataset, enhancing its instruction-following capabilities. With a substantial context length of 131072 tokens, it is designed for tasks requiring extensive contextual understanding and precise responses based on user prompts.

Loading preview...