EnergyAI/qwen3-4b-agrpo-nothink-lr3e-6
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Apr 12, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

EnergyAI/qwen3-4b-agrpo-nothink-lr3e-6 is a 4 billion parameter Qwen3-based model fine-tuned using Async GRPO for fill-in-the-middle multiple-choice questions in the energy domain. This model is specifically optimized for verification tasks, outputting answers in a \boxed{N} format. It achieves a final reward of approximately 0.57, demonstrating specialized performance in its target application.

Loading preview...

Model Overview

EnergyAI/qwen3-4b-agrpo-nothink-lr3e-6 is a 4 billion parameter language model built upon the Qwen3-4B architecture. It has been fine-tuned using the Async GRPO (Asynchronous Generalized Reinforcement Learning from Human Feedback) algorithm, specifically in a "nothink" mode, meaning it's optimized for direct answer generation without intermediate reasoning steps. The model's primary task is to answer fill-in-the-middle multiple-choice questions (MCQ) within the energy domain, providing its response in a \boxed{N} format where N corresponds to the option number.

Key Capabilities

  • Specialized for Energy Domain MCQs: Designed and trained for verification tasks in the energy sector.
  • Direct Answer Generation: Utilizes a "nothink" approach for efficient and direct output of answers.
  • Reinforcement Learning Fine-tuning: Leverages Async GRPO with a specific reward function:
    • +1.0 for correct answers.
    • -0.5 for incorrect answers.
    • -1.0 for no answer.
  • Optimized Training: Trained with a learning rate of 3e-6, cosine scheduler, and an effective batch size of 128 prompts/step over 2000 steps.
  • Performance: Achieved a final reward of approximately 0.57 with an average completion length of about 169 tokens.

Good For

  • Automated verification of multiple-choice questions in the energy industry.
  • Applications requiring precise, formatted answers (\boxed{N}).
  • Scenarios where a compact, specialized model for domain-specific question answering is beneficial.