Name: jindun/Qwen3-1.7B-GOPD-DeepMath API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jindun

Overview

jindun/Qwen3-1.7B-GOPD-DeepMath is a 2 billion parameter model built upon the Qwen3-1.7B base architecture. Its key differentiator is the fine-tuning process, which utilizes ExOPD (Extended Group Relative Policy Optimization) on the challenging DeepMath-103K dataset. This approach focuses on developing genuine mathematical reasoning skills through trial-and-error exploration, contrasting with traditional Supervised Fine-Tuning (SFT) which was observed to degrade performance.

Key Capabilities

Advanced Mathematical Reasoning: Specifically trained on a subset of DeepMath-103K containing 8,000 Olympiad-level problems (difficulty $\ge$ 6).
Policy Optimization: Employs ExOPD, an algorithm combining Group Relative Policy Optimization (GRPO) with Rollout Correction, to learn robust reasoning strategies.
Trial-and-Error Learning: Demonstrates that learning through exploration can be more effective for complex reasoning tasks than imitation learning, which showed a -16.67% performance degradation in SFT.

Training Details

The model was trained for 3 epochs with a batch size of 256 and a learning rate of 1e-5. It leveraged Keven16/Qwen3-4B-Non-Thinking-RL-Math-Step500 as a teacher model during the optimization process.

Good For

Complex Mathematical Problem Solving: Ideal for applications requiring deep mathematical understanding and reasoning, especially for problems at an advanced difficulty level.
Research in Reinforcement Learning for LLMs: Provides a case study on the effectiveness of policy optimization methods like ExOPD for enhancing reasoning capabilities in language models.

Overview

Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)