varshak1/open_reward_agent_qwen3_8b_sft_v1

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 21, 2026License:otherArchitecture:Transformer Cold

The varshak1/open_reward_agent_qwen3_8b_sft_v1 is an 8 billion parameter causal language model, fine-tuned from Qwen/Qwen3-8B on the open_reward_agent_sft_mix dataset. This model is designed for specific agentic tasks, leveraging its Qwen3 architecture and a 32768-token context length. Its training on a specialized dataset suggests optimization for reward agent applications.

Loading preview...

Model Overview

The varshak1/open_reward_agent_qwen3_8b_sft_v1 is an 8 billion parameter language model built upon the Qwen3-8B architecture. It has been specifically fine-tuned using the open_reward_agent_sft_mix dataset, indicating its specialization for tasks related to reward agents.

Key Characteristics

  • Base Model: Qwen/Qwen3-8B
  • Parameter Count: 8 billion
  • Context Length: 32768 tokens
  • Training Dataset: Fine-tuned on open_reward_agent_sft_mix

Training Details

The model underwent training with the following hyperparameters:

  • Learning Rate: 4e-05
  • Batch Size: 1 (train), 8 (eval)
  • Gradient Accumulation: 16 steps, leading to a total effective batch size of 128
  • Optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08
  • Scheduler: Cosine learning rate scheduler with 0.1 warmup steps
  • Epochs: 3.0

Intended Use

While specific details on intended uses and limitations are not provided in the original documentation, the model's fine-tuning on an "open_reward_agent_sft_mix" dataset strongly suggests its utility in applications requiring reward-based agentic behavior or reinforcement learning from human feedback (RLHF) related tasks. Developers should consider its specialized training for such use cases.