Name: EnergyAI/qwen3-8b-agrpo-think-lr3e-6 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: EnergyAI

Model Overview

EnergyAI/qwen3-8b-agrpo-think-lr3e-6 is an 8 billion parameter model built upon the Qwen3-8B architecture. It has been fine-tuned by EnergyAI using the Async GRPO (Asynchronous Generalized Reinforcement Learning with Policy Optimization) algorithm, notably with a "thinking mode" enabled during training. This specialized training approach aims to enhance the model's reasoning capabilities for specific tasks.

Key Capabilities

Energy Domain Verification: The model is specifically trained for fill-in-the-middle multiple-choice questions (MCQ) within the energy sector, focusing on verification tasks.
Structured Output: It is designed to output its answers in a precise \boxed{N} format, where N corresponds to the option number, facilitating automated evaluation.
Reinforcement Learning Optimization: Utilizes Async GRPO with a learning rate of 3e-6 and a cosine scheduler, trained for 2000 steps with an effective batch size of 128 prompts per step.
Thinking Mode: The enable_thinking=True parameter during training suggests an internal mechanism to improve decision-making or reasoning processes.

Use Cases

This model is particularly well-suited for applications requiring accurate, verifiable answers to multiple-choice questions in the energy domain. Its structured output format makes it ideal for automated systems that need to parse and validate model responses efficiently. Developers should consider this model for tasks where precise, domain-specific verification is critical, especially within the energy industry.

Overview

Model Overview

Key Capabilities

Use Cases

Full Model Card (README)