OpenDataArena/Qwen3-8B-ODA-Mixture-100k

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Dec 31, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

OpenDataArena/Qwen3-8B-ODA-Mixture-100k is an 8 billion parameter supervised fine-tuned model based on Qwen3-8B-Base, developed by OpenDataArena. It was trained using the ODA-Mixture-100k dataset, a compact, curated dataset of approximately 100,000 samples. This model is specifically designed to enhance general capabilities across General, Math, Code, and Reasoning domains, demonstrating consistent improvements over its base model.

Loading preview...

Model Overview

OpenDataArena/Qwen3-8B-ODA-Mixture-100k is an 8 billion parameter supervised fine-tuned (SFT) model built upon the Qwen/Qwen3-8B-Base architecture. Developed by OpenDataArena, this model leverages the OpenDataArena/ODA-Mixture-100k dataset, a meticulously curated collection of approximately 100,000 samples. The primary goal of this fine-tuning was to achieve significant general-purpose gains and improve multi-domain reasoning and problem-solving abilities across General, Math, Code, and Reasoning domains, all within a compact data budget.

Key Capabilities & Training

The ODA-Mixture-100k dataset was constructed by mixing top-performing open corpora identified via the OpenDataArena leaderboard. This involved a rigorous data curation pipeline including:

  • Data Collection: Utilizing LIMO for a strong reasoning baseline, augmented with AM-Thinking-v1-Distilled-math and AM-Thinking-v1-Distilled-code for specialized domain enhancement.
  • Deduplication & Decontamination: Exact deduplication and benchmark decontamination to minimize evaluation leakage.
  • Data Selection: Employing semantic clustering and preferential sampling of challenging instances (using sequence length as a proxy for complexity) to maximize impact within the 100K sample budget.

Performance Highlights

Evaluated against the full ODA benchmark suite, Qwen3-8B-ODA-Mixture-100k demonstrates consistent improvements over the base Qwen3-8B-Base model. Notably, it achieves an average score of 69.0 across General, Math, Code, and Reasoning benchmarks, compared to the base model's 53.2. It shows particularly strong gains in Math (77.3) and Code (73.2) performance, indicating its effectiveness in these specialized areas through targeted data curation.