clarkkitchen22/Qwen3-8B-GSM8K-Synth-50K

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Feb 28, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The clarkkitchen22/Qwen3-8B-GSM8K-Synth-50K is an 8 billion parameter Qwen3 model fine-tuned by clarkkitchen22 on 50,418 synthetic grade-school math problems. This model excels at step-by-step mathematical reasoning, producing structured chain-of-thought outputs within tags. It achieves 86.2% accuracy on the GSM8K test set, representing a 6.8 percentage point improvement over the base Qwen3-8B model, making it highly effective for math tutoring and research into synthetic data's impact on reasoning.

Loading preview...

Model Overview

clarkkitchen22/Qwen3-8B-GSM8K-Synth-50K is an 8 billion parameter Qwen3 model, specifically fine-tuned using QLoRA on a dataset of 50,418 synthetic grade-school math problems. Its primary function is to provide structured, step-by-step mathematical reasoning for word problems, outputting solutions within <think> tags before stating the final numerical answer.

Key Capabilities & Performance

  • Enhanced Mathematical Reasoning: Achieves 86.2% accuracy on the full GSM8K test set, a significant +6.8% improvement over the base Qwen3-8B model.
  • Structured Outputs: Generates clear, concise step-by-step solutions, leading to approximately 2.7x faster inference compared to the base model's verbose outputs.
  • Synthetic Data Efficacy: Demonstrates that fine-tuning with synthetic data (generated by Claude Haiku 4.5 and filtered through an 8-stage quality pipeline) substantially boosts performance on math reasoning tasks.
  • Memory Efficient Training: Trained on an NVIDIA RTX 4070 SUPER (12GB VRAM) using Unsloth's QLoRA and various memory optimizations, showcasing efficient resource utilization.

Intended Use Cases

  • Math Tutoring: Ideal for generating detailed, step-by-step solutions to grade-school math problems.
  • Research: Valuable for studying the impact of synthetic data and model scale on mathematical reasoning abilities.
  • Further Fine-tuning: Serves as a strong baseline for developing more specialized math reasoning models.

Limitations

  • Performance is bounded by the math ability of the generating model (Claude Haiku 4.5).
  • Optimized for GSM8K-style arithmetic and basic algebra; not designed for advanced math like calculus or geometry.
  • May struggle with problems requiring negative answers, as training data primarily features non-negative solutions.
  • Relies on a specific <think> tag format; other prompting styles might yield suboptimal results.