Name: lihaoxin2020/qwen3-4b-sft-gpt54-ep2-instance-rubric-gpt54-step200 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: lihaoxin2020

Model Overview

The lihaoxin2020/qwen3-4b-sft-gpt54-ep2-instance-rubric-gpt54-step200 is a 4 billion parameter language model built on the Qwen3 architecture, featuring a substantial context length of 32768 tokens. This particular version represents a GRPO checkpoint, signifying a specific stage in its training process, likely involving Guided Reinforcement Learning from Human Feedback (GRPO) or a similar refinement technique.

Key Characteristics

Architecture: Based on the Qwen3 model family.
Parameter Count: 4 billion parameters, offering a balance between performance and computational efficiency.
Context Length: Supports a long context window of 32768 tokens, enabling processing of extensive inputs and generating coherent, long-form responses.
Training Stage: Identified as a 'step 200' GRPO checkpoint, indicating a refined state from a specific training run.

Potential Use Cases

Given its designation, this model is likely optimized for:

Instruction Following: Executing complex instructions and generating precise outputs.
Rubric-Based Evaluation: Potentially designed for tasks involving adherence to specific rubrics or guidelines.
Refined Response Generation: Benefiting from GRPO training, it may excel in generating high-quality, aligned, and nuanced responses for specific applications.

Further details on its specific training objectives and performance can be found in the associated Weights & Biases run: https://wandb.ai/lihaoxin2020-yale-university/refiner-sft-grpo/runs/53i5b2nq.

Overview

Model Overview

Key Characteristics

Potential Use Cases

Full Model Card (README)