EnergyAI/qwen3-8b-agrpo-think-lr3e-6

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 11, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

EnergyAI/qwen3-8b-agrpo-think-lr3e-6 is an 8 billion parameter Qwen3-based causal language model developed by EnergyAI. It is fine-tuned using Async GRPO with an enabled "thinking mode" specifically for fill-in-the-middle multiple-choice questions (MCQ) in the energy domain. The model is optimized to provide verified answers, outputting its selection within a \boxed{N} format.

Loading preview...

Model Overview

EnergyAI/qwen3-8b-agrpo-think-lr3e-6 is an 8 billion parameter model built upon the Qwen3-8B architecture. It has been fine-tuned by EnergyAI using the Async GRPO (Asynchronous Generalized Reinforcement Learning with Policy Optimization) algorithm, notably with a "thinking mode" enabled during training. This specialized training approach aims to enhance the model's reasoning capabilities for specific tasks.

Key Capabilities

  • Energy Domain Verification: The model is specifically trained for fill-in-the-middle multiple-choice questions (MCQ) within the energy sector, focusing on verification tasks.
  • Structured Output: It is designed to output its answers in a precise \boxed{N} format, where N corresponds to the option number, facilitating automated evaluation.
  • Reinforcement Learning Optimization: Utilizes Async GRPO with a learning rate of 3e-6 and a cosine scheduler, trained for 2000 steps with an effective batch size of 128 prompts per step.
  • Thinking Mode: The enable_thinking=True parameter during training suggests an internal mechanism to improve decision-making or reasoning processes.

Use Cases

This model is particularly well-suited for applications requiring accurate, verifiable answers to multiple-choice questions in the energy domain. Its structured output format makes it ideal for automated systems that need to parse and validate model responses efficiently. Developers should consider this model for tasks where precise, domain-specific verification is critical, especially within the energy industry.