Name: EnergyAI/qwen3-4b-agrpo-nothink-lr3e-6 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: EnergyAI

Model Overview

EnergyAI/qwen3-4b-agrpo-nothink-lr3e-6 is a 4 billion parameter language model built upon the Qwen3-4B architecture. It has been fine-tuned using the Async GRPO (Asynchronous Generalized Reinforcement Learning from Human Feedback) algorithm, specifically in a "nothink" mode, meaning it's optimized for direct answer generation without intermediate reasoning steps. The model's primary task is to answer fill-in-the-middle multiple-choice questions (MCQ) within the energy domain, providing its response in a \boxed{N} format where N corresponds to the option number.

Key Capabilities

Specialized for Energy Domain MCQs: Designed and trained for verification tasks in the energy sector.
Direct Answer Generation: Utilizes a "nothink" approach for efficient and direct output of answers.
Reinforcement Learning Fine-tuning: Leverages Async GRPO with a specific reward function:
- +1.0 for correct answers.
- -0.5 for incorrect answers.
- -1.0 for no answer.
Optimized Training: Trained with a learning rate of 3e-6, cosine scheduler, and an effective batch size of 128 prompts/step over 2000 steps.
Performance: Achieved a final reward of approximately 0.57 with an average completion length of about 169 tokens.

Good For

Automated verification of multiple-choice questions in the energy industry.
Applications requiring precise, formatted answers (\boxed{N}).
Scenarios where a compact, specialized model for domain-specific question answering is beneficial.

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)