lihaoxin2020/qwen3-4b-refiner-gpt54-rubric-v3-2-rl-lr5e-6-step100

TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Apr 21, 2026Architecture:Transformer Cold

The lihaoxin2020/qwen3-4b-refiner-gpt54-rubric-v3-2-rl-lr5e-6-step100 is a 4 billion parameter language model, developed by lihaoxin2020, based on the Qwen3 architecture. This model is a GRPO checkpoint, refined from an earlier version, and is specifically optimized for tasks related to the GPT54 rubric. With a context length of 32768 tokens, it is designed for applications requiring nuanced evaluation or generation aligned with specific rubric criteria.

Loading preview...

Model Overview

The lihaoxin2020/qwen3-4b-refiner-gpt54-rubric-v3-2-rl-lr5e-6-step100 is a 4 billion parameter language model built upon the Qwen3 architecture. It represents a specific checkpoint from a training run, having undergone further refinement using a GRPO (Generalized Reinforcement Learning from Human Feedback with Policy Optimization) method.

Key Characteristics

  • Base Model: Qwen3-4B architecture.
  • Refinement: Trained as a GRPO checkpoint, specifically refined from lihaoxin2020/qwen3-4b-refiner-gpt54-ep2.
  • Optimization Target: The model's training is geared towards performance aligned with the "GPT54 rubric," suggesting a specialization in tasks or evaluations that adhere to this specific set of criteria.
  • Context Length: Supports a substantial context window of 32768 tokens.

Intended Use Cases

This model is particularly suited for applications where adherence to a specific rubric or set of evaluation guidelines (like the GPT54 rubric) is critical. It can be beneficial for:

  • Generating responses that conform to predefined quality or style standards.
  • Refining existing text to better meet specific criteria.
  • Tasks requiring nuanced understanding and application of a rubric for content creation or evaluation.