Name: anurag203/clarify-rl-run4-qwen3-1.7b-beta0.2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: anurag203

What is anurag203/clarify-rl-run4-qwen3-1.7b-beta0.2?

This model is a 2 billion parameter Qwen3-based language model, fine-tuned by anurag203 using Group Relative Policy Optimization (GRPO) with a specific KL anchor (β=0.2). The primary goal of this training was to steer the model towards an "ask-first" policy, enabling it to clarify underspecified user requests through questions before proposing a plan.

Key Capabilities and Differentiators

Clarification-Oriented Agent: Unlike general chat assistants, this model is explicitly trained to ask clarifying questions when faced with ambiguous requests, rather than making assumptions or hallucinating.
GRPO with KL Anchor: The use of a KL anchor at β=0.2 was critical in preventing capability collapse observed in earlier runs, leading to a measurable improvement on held-out evaluations while preserving breadth across task families.
Cost-Efficient Training: The model was trained in approximately 78 minutes on a single A100 GPU, costing around $1.80, demonstrating efficient RL fine-tuning.
Specific Task Families: Evaluated across five task families including event_planning, medical_intake, meeting_scheduling, support_triage, and coding, with notable improvements in event_planning over the base model.

Should I use this for my use case?

Good for:

Research and Hackathons: Reproducing the KL-anchor ablation study on a small reasoner.
Demo and Education: Illustrating how a 1.7B parameter model can be guided towards an "ask-first" policy with a small RL budget.
Agentic, Multi-turn, Tool-using Settings: Ideal as a drop-in replacement for Qwen/Qwen3-1.7B where an agent needs to clarify ambiguous requests instead of hallucinating.

Not suitable for:

General chat assistance or open-ended prompts, as its reward shaping is highly specific.
Production, safety-critical, medical, or legal applications due to lack of RLHF safety alignment.
Non-English tasks, as it is limited to English.

Overview

What is anurag203/clarify-rl-run4-qwen3-1.7b-beta0.2?

Key Capabilities and Differentiators

Should I use this for my use case?

Full Model Card (README)