Name: jaygala24/Qwen3-4B-ReMax-math-reasoning API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jaygala24

Overview

This model, jaygala24/Qwen3-4B-ReMax-math-reasoning, is a specialized fine-tune of the Qwen3-4B base model, developed by jaygala24. Its primary focus is on mathematical reasoning, achieved through fine-tuning with the ReMax reinforcement learning algorithm (without KL penalty) using the PipelineRL framework.

Key Capabilities & Performance

The model has been trained on mathematical datasets including gsm8k_train and math_train, and evaluated on gsm8k_test and math_500. It exhibits strong performance in mathematical problem-solving, as evidenced by its pass@k scores:

GSM8K (test): 89.23% pass@1, 96.13% pass@32
MATH-500: 81.25% pass@1, 96.60% pass@32
Overall: 87.04% pass@1, 96.26% pass@32

These results are based on generating 32 samples per problem with a temperature of 1.0.

Training Details

The fine-tuning process utilized a learning rate of 1e-06, a sequence length of 8192, and an effective batch size of 256. The ReMax algorithm employed a greedy-decoded response reward as the advantage baseline and performed 1 deterministic rollout per prompt. Full training logs are available on Weights & Biases.

Good for

Applications requiring accurate mathematical reasoning and step-by-step problem-solving.
Tasks involving arithmetic, algebra, and other quantitative challenges where a high pass rate on multiple attempts is beneficial.

Overview

Overview

Key Capabilities & Performance

Training Details

Good for

Full Model Card (README)