Name: varshak1/open_reward_agent_qwen3_8b_sft_v1 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: varshak1

Model Overview

The varshak1/open_reward_agent_qwen3_8b_sft_v1 is an 8 billion parameter language model built upon the Qwen3-8B architecture. It has been specifically fine-tuned using the open_reward_agent_sft_mix dataset, indicating its specialization for tasks related to reward agents.

Key Characteristics

Base Model: Qwen/Qwen3-8B
Parameter Count: 8 billion
Context Length: 32768 tokens
Training Dataset: Fine-tuned on open_reward_agent_sft_mix

Training Details

The model underwent training with the following hyperparameters:

Learning Rate: 4e-05
Batch Size: 1 (train), 8 (eval)
Gradient Accumulation: 16 steps, leading to a total effective batch size of 128
Optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08
Scheduler: Cosine learning rate scheduler with 0.1 warmup steps
Epochs: 3.0

Intended Use

While specific details on intended uses and limitations are not provided in the original documentation, the model's fine-tuning on an "open_reward_agent_sft_mix" dataset strongly suggests its utility in applications requiring reward-based agentic behavior or reinforcement learning from human feedback (RLHF) related tasks. Developers should consider its specialized training for such use cases.

Overview

Model Overview

Key Characteristics

Training Details

Intended Use

Full Model Card (README)