Name: spiral-rl/Spiral-Qwen3-4B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: spiral-rl

Model Overview

Spiral-Qwen3-4B is a 4 billion parameter language model developed by spiral-rl, built upon the Qwen3 base architecture. Its core innovation lies in its training methodology: the SPIRAL framework, which utilizes self-play on multi-turn, zero-sum games (such as TicTacToe, Kuhn Poker, and Simple Negotiation). This approach allows the model to learn and develop sophisticated reasoning strategies without relying on expert-curated problem-answer pairs or domain-specific reward engineering.

Key Capabilities & Training

Autonomous Reasoning Development: SPIRAL enables models to learn by playing against continuously improving versions of themselves, generating an infinite curriculum of progressively challenging problems.
Transferable Reasoning: Through zero-sum self-play, the model develops advanced reasoning strategies that lead to substantial gains on a range of math and general reasoning benchmarks.
Actor-Learner Architecture: Employs a scalable actor-learner architecture where parallel actors sample trajectories from diverse games, and a centralized learner processes these using Role-conditioned Advantage Estimation (RAE) for on-policy reinforcement learning updates.
High Context Length: Supports a context length of 40960 tokens, allowing for processing longer inputs and maintaining conversational coherence.

Ideal Use Cases

Research in AI Reasoning: Excellent for exploring autonomous reasoning development and self-play reinforcement learning.
Complex Problem Solving: Suitable for tasks requiring advanced logical deduction and strategic thinking, particularly in game-like or adversarial scenarios.
Benchmarking Reasoning Abilities: Can be used to evaluate and compare reasoning capabilities against other models, especially in mathematical and general reasoning domains.