simonycl/Qwen3-4B-Instruct-2507-InverseIFEval-DPO
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Mar 24, 2026Architecture:Transformer Warm

The simonycl/Qwen3-4B-Instruct-2507-InverseIFEval-DPO is a 4 billion parameter instruction-tuned language model, fine-tuned by simonycl from the Qwen/Qwen3-4B-Instruct-2507 base model. It was trained using Direct Preference Optimization (DPO) with the TRL framework, enhancing its ability to align with human preferences. This model is designed for generating high-quality, preference-aligned text responses in conversational and instructional settings, leveraging its 32768 token context length.

Loading preview...