xx18/Baseline-4B-MATH12K
xx18/Baseline-4B-MATH12K is a 4 billion parameter language model developed by xx18, fine-tuned for enhanced mathematical reasoning. Built upon the Qwen3-8B-Base architecture, it utilizes the Composition-RL framework and was trained on the MATH-Composition-199K dataset. This model specializes in solving complex, compositional mathematical problems by generating verifiable prompts for reinforcement learning. It offers a 32768 token context length, making it suitable for tasks requiring deep mathematical understanding and problem-solving.
Loading preview...
Model Overview
xx18/Baseline-4B-MATH12K is a 4 billion parameter language model developed by xx18, specifically fine-tuned for advanced mathematical reasoning. It is initialized from the Qwen3-8B-Base architecture and trained using the innovative Composition-RL framework on the MATH-Composition-199K dataset.
Key Capabilities & Training
This model's core strength lies in its ability to handle complex, compositional mathematical problems. The Composition-RL framework is a data-efficient Reinforcement Learning with Verifiable Rewards (RLVR) approach. It addresses the challenge of "too-easy" prompts in traditional RL by automatically composing multiple verifiable problems into a single, more challenging prompt. This ensures a continuous and informative reward signal throughout the reinforcement learning process, leading to robust reasoning capabilities.
- Base Model: Qwen3-8B-Base
- Training Method: Reinforcement Learning with Verifiable Rewards (RLVR)
- Training Dataset: MATH-Composition-199K
- Context Length: 32768 tokens
When to Use This Model
This model is particularly well-suited for applications requiring:
- Solving multi-step mathematical problems.
- Tasks that benefit from verifiable reasoning steps.
- Research into advanced reinforcement learning techniques for LLMs.
- Educational tools focused on mathematical problem-solving.