qgallouedec/online-dpo-qwen2-2
TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kArchitecture:Transformer Warm
The qgallouedec/online-dpo-qwen2-2 is a 0.5 billion parameter causal language model, fine-tuned from Qwen/Qwen2-0.5B-Instruct. This model was specifically optimized using the trl-lib/ultrafeedback-prompt dataset, enhancing its instruction-following capabilities. With a substantial context length of 131072 tokens, it is designed for tasks requiring extensive contextual understanding and precise responses based on user prompts.
Loading preview...
Model Overview
The qgallouedec/online-dpo-qwen2-2 is a 0.5 billion parameter language model derived from the Qwen2-0.5B-Instruct architecture. It has been further refined through fine-tuning on the trl-lib/ultrafeedback-prompt dataset.
Key Capabilities
- Instruction Following: Enhanced ability to understand and execute instructions due to fine-tuning on a comprehensive feedback dataset.
- Context Handling: Features a notable context length of 131072 tokens, allowing it to process and generate responses based on very long inputs.
- Qwen2 Base: Inherits the foundational capabilities and architectural strengths of the Qwen2 family of models.
Good For
- Applications requiring a compact yet capable model for instruction-based tasks.
- Scenarios where processing extensive input contexts is crucial.
- Developing conversational agents or systems that benefit from improved instruction adherence.