Name: xx18/Baseline-4B-MATH12K API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: xx18

Model Overview

xx18/Baseline-4B-MATH12K is a 4 billion parameter language model developed by xx18, specifically fine-tuned for advanced mathematical reasoning. It is initialized from the Qwen3-8B-Base architecture and trained using the innovative Composition-RL framework on the MATH-Composition-199K dataset.

Key Capabilities & Training

This model's core strength lies in its ability to handle complex, compositional mathematical problems. The Composition-RL framework is a data-efficient Reinforcement Learning with Verifiable Rewards (RLVR) approach. It addresses the challenge of "too-easy" prompts in traditional RL by automatically composing multiple verifiable problems into a single, more challenging prompt. This ensures a continuous and informative reward signal throughout the reinforcement learning process, leading to robust reasoning capabilities.

Base Model: Qwen3-8B-Base
Training Method: Reinforcement Learning with Verifiable Rewards (RLVR)
Training Dataset: MATH-Composition-199K
Context Length: 32768 tokens

When to Use This Model

This model is particularly well-suited for applications requiring:

Solving multi-step mathematical problems.
Tasks that benefit from verifiable reasoning steps.
Research into advanced reinforcement learning techniques for LLMs.
Educational tools focused on mathematical problem-solving.

Overview

Model Overview

Key Capabilities & Training

When to Use This Model

Full Model Card (README)