HCY123902/qwen25_7b_base_hc_ssts_n32_r1_dpo
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Apr 9, 2026Architecture:Transformer Cold

HCY123902/qwen25_7b_base_hc_ssts_n32_r1_dpo is a 7.6 billion parameter language model fine-tuned from Qwen/Qwen2.5-7B. It was trained using Direct Preference Optimization (DPO) with TRL, enhancing its ability to align with human preferences. This model is designed for general text generation tasks, leveraging its DPO training for improved response quality and relevance within its 32768 token context window.

Loading preview...