Name: EnergyAI/qwen3-4b-agrpo-think-lr3e-6 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: EnergyAI

Overview

EnergyAI/qwen3-4b-agrpo-think-lr3e-6 is a specialized 4 billion parameter model built upon the Qwen3-4B architecture. It has been fine-tuned using the Async GRPO (Asynchronous Generalized Reinforcement Learning from Human Feedback) algorithm, notably with its 'thinking mode' enabled. This configuration is tailored for enhanced reasoning during task execution.

Key Capabilities

Energy Domain Verification: Optimized for fill-in-the-middle multiple-choice questions (MCQ) relevant to the energy sector.
Structured Output: Designed to output answers in a precise \boxed{N} format, where N represents the option number.
Reinforcement Learning: Leverages Async GRPO for training, incorporating a reward function that penalizes incorrect or missing answers and rewards correct ones.
Thinking Mode: The enable_thinking=True setting during training suggests an internal reasoning process to improve answer accuracy.

Training Details

The model underwent 2000 training steps with a learning rate of 3e-6, achieving a final reward of approximately 0.45. It was trained with an effective batch size of 128 prompts per step and utilized FSDP2 for parallelism. The average completion length during training was around 2370 tokens, indicating its capacity for generating detailed responses.

Overview

Overview

Key Capabilities

Training Details

Full Model Card (README)