prithivMLmods/ReasonFlux-Qwen3-dpo
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kLicense:apache-2.0Architecture:Transformer0.0K Open Weights Warm

ReasonFlux-Qwen3-dpo by prithivMLmods is a 2 billion parameter Qwen3-based model, fine-tuned with DPO on the ReasonFlux-V2-Reasoner-DPO dataset. It utilizes a template-augmented reasoning paradigm and iterative hierarchical reinforcement learning to enhance transparent, consistent, and adaptive reasoning. This model excels in multi-domain scientific, mathematical, and coding tasks, providing structured outputs and detailed explanations.

Loading preview...