Name: LlameUser/qwen-3-4b-thinking-r1-st API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: LlameUser

Model Overview

LlameUser/qwen-3-4b-thinking-r1-st is a specialized language model derived from the Qwen/Qwen3-4B-Thinking-2507 base model. It has been fine-tuned using the TRL library to improve its performance in specific domains.

Key Capabilities

Enhanced Mathematical Reasoning: This model's training procedure specifically incorporates the GRPO (Gradient-based Reasoning Policy Optimization) method. GRPO, introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300), is designed to significantly boost a model's ability to handle complex mathematical problems and logical thinking tasks.
Instruction Following: As a fine-tuned model, it is expected to follow user instructions effectively, particularly in contexts related to its specialized training.

Good For

Mathematical Problem Solving: Ideal for applications requiring robust mathematical reasoning, such as solving equations, logical puzzles, or generating step-by-step mathematical solutions.
Reasoning-intensive Tasks: Suitable for use cases where logical deduction and structured thinking are paramount.
Research and Development: Provides a strong base for further experimentation and fine-tuning on specific mathematical or reasoning datasets.

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)