LorenaYannnnn/general_reward-Qwen3-0.6B-baseline_all_tokens_w_kl-seed_1
TEXT GENERATIONConcurrency Cost:1Model Size:0.8BQuant:BF16Ctx Length:32kPublished:Mar 21, 2026Architecture:Transformer Warm

The LorenaYannnnn/general_reward-Qwen3-0.6B-baseline_all_tokens_w_kl-seed_1 is an 0.8 billion parameter Qwen3-based model. This model is a general reward model, specifically fine-tuned to evaluate and score responses based on a broad range of criteria. Its primary use case is to provide a baseline for reward signal generation, aiding in the development and optimization of other language models.

Loading preview...