Name: EnergyAI/qwen3-4b-agrpo-think-lr5e-7 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: EnergyAI

Model Overview

EnergyAI/qwen3-4b-agrpo-think-lr5e-7 is a 4 billion parameter model built upon the Qwen3-4B architecture. It has been fine-tuned using the Async GRPO (Asynchronous Generalized Reinforcement Learning with Policy Optimization) algorithm, specifically leveraging TRL's AsyncGRPOTrainer. A key feature of this model's training is the enabled 'thinking mode' (enable_thinking=True), which likely contributes to its specialized performance.

Key Capabilities

Energy Domain Verification: Designed for fill-in-the-middle multiple-choice questions (MCQ) within the energy sector.
Structured Output: Outputs answers in a precise \boxed{N} format, where N corresponds to the option number, facilitating automated parsing and verification.
Reinforcement Learning Optimization: Trained with a reward function that grants +1.0 for correct answers, -0.5 for wrong answers, and -1.0 for no answer, indicating a strong focus on accuracy and response generation.

Training Details

The model was trained with a learning rate of 5e-7, a cosine scheduler, and a substantial effective batch size of 128 prompts per step. It underwent 2000 maximum steps with 9 generations per prompt and a maximum completion length of 4096 tokens. The training utilized FSDP2 parallelism across 4 GPUs, with vLLM TP=4 for inference, demonstrating a robust and scalable training setup.

Good For

Automated assessment of energy-related multiple-choice questions.
Applications requiring precise, structured answers for verification tasks.
Research into the effectiveness of Async GRPO and 'thinking mode' in specialized domain LLMs.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)