Name: lihaoxin2020/qwen3-4b-sft-gpt54-ep2-evolving-rubric-gpt41-step200 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: lihaoxin2020

Model Overview

The lihaoxin2020/qwen3-4b-sft-gpt54-ep2-evolving-rubric-gpt41-step200 is a 4 billion parameter language model, identified as a GRPO (Generative Reinforcement Learning with Policy Optimization) checkpoint. This model has undergone Supervised Fine-Tuning (SFT) and incorporates an "evolving rubric" derived from GPT-4.1, suggesting a sophisticated training methodology aimed at refining its output quality.

Key Characteristics

Parameter Count: 4 billion parameters, offering a balance between performance and computational efficiency.
Training Methodology: Utilizes Supervised Fine-Tuning (SFT) combined with a GRPO approach, indicating a focus on generating high-quality, policy-aligned responses.
Rubric Integration: Incorporates an "evolving rubric" from GPT-4.1, implying that the model's training was guided by advanced evaluation criteria to improve its performance and alignment.
Context Length: Supports a substantial context length of 32768 tokens, enabling it to process and generate longer, more coherent texts.

Potential Use Cases

This model is likely well-suited for applications where refined and contextually aware text generation is crucial, especially in scenarios benefiting from advanced fine-tuning and reinforcement learning. Its training with a GPT-4.1-derived rubric suggests potential strengths in:

Generating high-quality, nuanced responses.
Tasks requiring adherence to specific guidelines or styles.
Applications where model alignment and controlled output are important.

Overview

Model Overview

Key Characteristics

Potential Use Cases

Full Model Card (README)