HCY123902/qwen25_7b_base_hc_ssss_n32_r1_no_know_in_rubric_dpo
This is a 7.6 billion parameter language model, fine-tuned from Qwen/Qwen2.5-7B by HCY123902. It was trained using Direct Preference Optimization (DPO) via the TRL framework, which enhances its ability to align with human preferences. The model has a context length of 32768 tokens and is suitable for general text generation tasks where preference alignment is beneficial.
Loading preview...
Model Overview
This model, HCY123902/qwen25_7b_base_hc_ssss_n32_r1_no_know_in_rubric_dpo, is a 7.6 billion parameter language model derived from the Qwen/Qwen2.5-7B base architecture. It has been specifically fine-tuned using the Direct Preference Optimization (DPO) method, implemented through the TRL framework.
Key Characteristics
- Base Model: Qwen/Qwen2.5-7B.
- Parameter Count: 7.6 billion parameters.
- Context Length: Supports a context window of 32768 tokens.
- Training Method: Utilizes Direct Preference Optimization (DPO), a technique designed to align language model outputs more closely with human preferences by treating the language model itself as a reward model.
- Frameworks: Trained with TRL (Transformer Reinforcement Learning) and built upon Transformers, Pytorch, Datasets, and Tokenizers.
Use Cases
This model is well-suited for general text generation tasks where the goal is to produce outputs that are aligned with specified preferences, benefiting from its DPO-based fine-tuning. Developers can integrate it using the Hugging Face pipeline for text generation.