Name: xx18/Composition-RL-4B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: xx18

Overview of Composition-RL-4B

xx18/Composition-RL-4B is a 4 billion parameter model, fine-tuned from the Qwen3-8B-Base architecture. It utilizes the Composition-RL framework, a data-efficient Reinforcement Learning with Verifiable Rewards (RLVR) approach, to enhance its reasoning abilities.

Key Capabilities and Training

The core innovation of Composition-RL lies in its ability to automatically compose multiple verifiable problems into a single, more complex prompt. This method addresses the issue of "too-easy" prompts during RL training, ensuring the model consistently receives challenging and informative signals. The model was trained using the MATH-Composition-199K dataset.

What Makes This Model Different?

Unlike standard fine-tuning or RL methods that might struggle with diminishing returns on simpler tasks, Composition-RL ensures continuous learning by dynamically increasing prompt complexity. This leads to improved performance, particularly in:

Mathematical reasoning
Scientific problem-solving

This approach is detailed in the paper: Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models.

Overview

Overview of Composition-RL-4B

Key Capabilities and Training

What Makes This Model Different?

Full Model Card (README)