Name: ottys/dpo-qwen-cot-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: ottys

Overview

This model, ottys/dpo-qwen-cot-merged, is a 4 billion parameter language model derived from the Qwen/Qwen3-4B-Instruct-2507 base. It was developed by ottys using Direct Preference Optimization (DPO) as part of a competition, adhering strictly to specified guidelines regarding base model, training methodology, and data usage.

Key Capabilities

Enhanced Structured Data Output: The model is specifically trained to improve the accuracy of structured data generation.
Improved Chain-of-Thought (CoT) Reasoning: It aims to strengthen the model's ability to articulate its reasoning process.
DPO Fine-tuning: Utilizes DPO with a filtered, high-quality subset of official DPO data, focusing on specific tasks.

Training Details

The model underwent 1 epoch of DPO training with a learning rate of 1e-07 and a beta value of 0.1. The maximum sequence length used during training was 512 tokens. Notably, no new data was generated or modified using AI; all training data was selected from the provided official dataset.

Usage

For evaluation, users are instructed to use the provided "2026 final assignment main competition_standard code 2 (submission JSON generation)" by replacing the model ID with ottys/dpo-qwen-cot-merged.

Licensing

The base model operates under the Apache 2.0 license, and the training data consists solely of the officially distributed dataset.

Overview

Overview

Key Capabilities

Training Details

Usage

Licensing

Full Model Card (README)