lihaoxin2020/qwen3-4b-sft-gpt54-ep2-instance-rubric-gpt54-step200

TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Apr 26, 2026Architecture:Transformer Cold

The lihaoxin2020/qwen3-4b-sft-gpt54-ep2-instance-rubric-gpt54-step200 is a 4 billion parameter Qwen3-based model with a 32768 token context length. This model is a GRPO checkpoint, indicating it is a specific iteration from a training run focused on refining model performance. Its primary purpose is likely related to instruction following or specific task execution, given its 'sft-gpt54-ep2-instance-rubric-gpt54-answer_only' designation.

Loading preview...

Model Overview

The lihaoxin2020/qwen3-4b-sft-gpt54-ep2-instance-rubric-gpt54-step200 is a 4 billion parameter language model built on the Qwen3 architecture, featuring a substantial context length of 32768 tokens. This particular version represents a GRPO checkpoint, signifying a specific stage in its training process, likely involving Guided Reinforcement Learning from Human Feedback (GRPO) or a similar refinement technique.

Key Characteristics

  • Architecture: Based on the Qwen3 model family.
  • Parameter Count: 4 billion parameters, offering a balance between performance and computational efficiency.
  • Context Length: Supports a long context window of 32768 tokens, enabling processing of extensive inputs and generating coherent, long-form responses.
  • Training Stage: Identified as a 'step 200' GRPO checkpoint, indicating a refined state from a specific training run.

Potential Use Cases

Given its designation, this model is likely optimized for:

  • Instruction Following: Executing complex instructions and generating precise outputs.
  • Rubric-Based Evaluation: Potentially designed for tasks involving adherence to specific rubrics or guidelines.
  • Refined Response Generation: Benefiting from GRPO training, it may excel in generating high-quality, aligned, and nuanced responses for specific applications.

Further details on its specific training objectives and performance can be found in the associated Weights & Biases run: https://wandb.ai/lihaoxin2020-yale-university/refiner-sft-grpo/runs/53i5b2nq.