FinaPolat/RAISED_QWEN_8B_DPO
The FinaPolat/RAISED_QWEN_8B_DPO is an 8 billion parameter Qwen3-based language model developed by FinaPolat, fine-tuned using DPO from FinaPolat/RAISED_QWEN_8B_SFT. This model was trained with a context length of 32768 tokens and utilizes Unsloth and Huggingface's TRL library for accelerated training. It is designed for general language generation tasks, building upon its SFT predecessor.
Loading preview...
Model Overview
The FinaPolat/RAISED_QWEN_8B_DPO is an 8 billion parameter language model developed by FinaPolat. It is a Qwen3-based model that has been fine-tuned using Direct Preference Optimization (DPO) from its supervised fine-tuned (SFT) counterpart, FinaPolat/RAISED_QWEN_8B_SFT. The model supports a substantial context length of 32768 tokens.
Key Training Details
- Base Model: Qwen3 architecture
- Fine-tuning Method: Direct Preference Optimization (DPO)
- Predecessor Model: FinaPolat/RAISED_QWEN_8B_SFT
- Training Acceleration: Utilizes Unsloth and Huggingface's TRL library, enabling a reported 2x faster training speed.
- License: Apache-2.0
Potential Use Cases
This model is suitable for a variety of general-purpose language generation and understanding tasks, benefiting from its DPO fine-tuning which typically enhances alignment with human preferences. Its efficient training methodology suggests a focus on practical deployment and performance.