OpenDataArena/Qwen3-8B-ODA-Math-460k

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Dec 31, 2025License:cc-by-nc-4.0Architecture:Transformer0.0K Open Weights Cold

OpenDataArena/Qwen3-8B-ODA-Math-460k is an 8 billion parameter supervised fine-tuned model built on Qwen3-8B-Base, developed by OpenDataArena. It is specifically trained on the ODA-Math-460k dataset, which is curated for mathematical reasoning and competition-style problem-solving. This model is optimized to efficiently improve mathematical reasoning capabilities through high-quality, verified solutions.

Loading preview...

Qwen3-8B-ODA-Math-460k: Specialized for Mathematical Reasoning

This model is a supervised fine-tuned (SFT) version of Qwen3-8B-Base, developed by OpenDataArena, with a focus on enhancing mathematical reasoning. It leverages the unique ODA-Math-460k dataset, a meticulously curated collection of approximately 460,000 math problems.

Key Differentiators & Training Data

The ODA-Math-460k dataset is distinguished by its rigorous curation pipeline:

  • Data Collection: Aggregated from top-performing math datasets identified by the OpenDataArena leaderboard.
  • Deduplication & Decontamination: Exact deduplication and benchmark decontamination to prevent evaluation leakage.
  • Question Filtering: Multi-stage LLM-based filtering for domain specificity, validity, and problem type, excluding proofs and multiple-choice questions to focus on free-form problems.
  • Data Selection: Problems are selected to be challenging for smaller models but solvable by stronger reasoning models, using a two-stage filtering process with Qwen3-8B and Qwen3-30B-A3B.
  • Distillation & Verification: Solutions are distilled using AM-Thinking-v1 as a teacher and verified by Compass-Verifier-7B, ensuring only correct problem-response pairs are included.

Performance

Evaluations show that Qwen3-8B-ODA-Math-460k achieves consistent gains over base checkpoints, with notable improvements on competition-style benchmarks. It demonstrates strong performance across various math datasets, including GSM8K, Math500, Omni-Math, and Olympiad-style problems, achieving an average score of 68.8 across a suite of benchmarks.

Ideal Use Cases

  • Mathematical Problem Solving: Excels in solving complex, free-form mathematical problems.
  • Educational Tools: Can be integrated into platforms requiring robust mathematical reasoning.
  • Competitive Math Preparation: Particularly strong in handling competition-style math challenges.