KKHYA/qwen3-14b-fft-math

TEXT GENERATIONConcurrency Cost:1Model Size:14BQuant:FP8Ctx Length:32kPublished:Apr 29, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The KKHYA/qwen3-14b-fft-math model is a 14 billion parameter language model, fine-tuned from Qwen/Qwen3-14B with a 32768 token context length. It is specifically optimized for mathematical reasoning tasks, having been trained on diverse math-focused datasets including mft_metamath and various mft_numinamath and mft_tulu3_personas_math datasets. This model is designed to excel in complex mathematical problem-solving and algebraic reasoning.

Loading preview...

Model Overview

The KKHYA/qwen3-14b-fft-math is a 14 billion parameter language model built upon the Qwen/Qwen3-14B architecture. It features a substantial context length of 32768 tokens, making it suitable for processing lengthy mathematical problems and contexts.

Key Capabilities

This model has undergone specialized fine-tuning to enhance its performance in mathematical domains. Its training involved a curated selection of datasets, including:

  • mft_metamath: A dataset focused on mathematical reasoning.
  • mft_numinamath_tir and mft_numinamath_cot: Datasets likely targeting numerical and chain-of-thought mathematical reasoning.
  • mft_tulu3_personas_math, mft_tulu3_personas_math_grade, and mft_tulu3_personas_algebra: Datasets designed to improve mathematical problem-solving across different grade levels and algebraic concepts.

Training Details

The fine-tuning process utilized specific hyperparameters to optimize its mathematical capabilities:

  • Learning Rate: 1e-05
  • Optimizer: ADAMW_TORCH_FUSED with betas=(0.9, 0.95) and epsilon=1e-08
  • Batch Size: A total train batch size of 128 (with gradient accumulation steps of 16 and a train batch size of 1 per device across 8 GPUs).
  • Epochs: Trained for 2.0 epochs.

Good For

  • Mathematical Reasoning: Excels in tasks requiring logical deduction and problem-solving in mathematics.
  • Algebraic Problems: Specifically trained on algebraic datasets, indicating strong performance in this area.
  • Educational Applications: Potentially useful for generating explanations or solving problems in math education contexts.