ottys/dpo-qwen-cot-merged

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 28, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The ottys/dpo-qwen-cot-merged model is a 4 billion parameter instruction-tuned causal language model based on the Qwen3-4B-Instruct-2507 architecture. Developed by ottys, it utilizes Direct Preference Optimization (DPO) on a filtered subset of official DPO data, focusing on enhancing structured data output accuracy and Chain-of-Thought reasoning. With a 32768 token context length, this model is specifically optimized for tasks requiring precise structured outputs and improved inference processes.

Loading preview...

Overview

This model, ottys/dpo-qwen-cot-merged, is a 4 billion parameter language model derived from the Qwen/Qwen3-4B-Instruct-2507 base. It was developed by ottys using Direct Preference Optimization (DPO) as part of a competition, adhering strictly to specified guidelines regarding base model, training methodology, and data usage.

Key Capabilities

  • Enhanced Structured Data Output: The model is specifically trained to improve the accuracy of structured data generation.
  • Improved Chain-of-Thought (CoT) Reasoning: It aims to strengthen the model's ability to articulate its reasoning process.
  • DPO Fine-tuning: Utilizes DPO with a filtered, high-quality subset of official DPO data, focusing on specific tasks.

Training Details

The model underwent 1 epoch of DPO training with a learning rate of 1e-07 and a beta value of 0.1. The maximum sequence length used during training was 512 tokens. Notably, no new data was generated or modified using AI; all training data was selected from the provided official dataset.

Usage

For evaluation, users are instructed to use the provided "2026 final assignment main competition_standard code 2 (submission JSON generation)" by replacing the model ID with ottys/dpo-qwen-cot-merged.

Licensing

The base model operates under the Apache 2.0 license, and the training data consists solely of the officially distributed dataset.