xx18/Baseline-4B-MATH12K

TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 12, 2026Architecture:Transformer Cold

xx18/Baseline-4B-MATH12K is a 4 billion parameter language model developed by xx18, fine-tuned for enhanced mathematical reasoning. Built upon the Qwen3-8B-Base architecture, it utilizes the Composition-RL framework and was trained on the MATH-Composition-199K dataset. This model specializes in solving complex, compositional mathematical problems by generating verifiable prompts for reinforcement learning. It offers a 32768 token context length, making it suitable for tasks requiring deep mathematical understanding and problem-solving.

Loading preview...

Model Overview

xx18/Baseline-4B-MATH12K is a 4 billion parameter language model developed by xx18, specifically fine-tuned for advanced mathematical reasoning. It is initialized from the Qwen3-8B-Base architecture and trained using the innovative Composition-RL framework on the MATH-Composition-199K dataset.

Key Capabilities & Training

This model's core strength lies in its ability to handle complex, compositional mathematical problems. The Composition-RL framework is a data-efficient Reinforcement Learning with Verifiable Rewards (RLVR) approach. It addresses the challenge of "too-easy" prompts in traditional RL by automatically composing multiple verifiable problems into a single, more challenging prompt. This ensures a continuous and informative reward signal throughout the reinforcement learning process, leading to robust reasoning capabilities.

  • Base Model: Qwen3-8B-Base
  • Training Method: Reinforcement Learning with Verifiable Rewards (RLVR)
  • Training Dataset: MATH-Composition-199K
  • Context Length: 32768 tokens

When to Use This Model

This model is particularly well-suited for applications requiring:

  • Solving multi-step mathematical problems.
  • Tasks that benefit from verifiable reasoning steps.
  • Research into advanced reinforcement learning techniques for LLMs.
  • Educational tools focused on mathematical problem-solving.