Name: DCAgent/g1_timeout_e1_gpt_long_sampled_swesmith_psu_thinking_tacc-Qwen3-32B API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: DCAgent

Model Overview

This model, g1_timeout_e1_gpt_long_sampled_swesmith_psu_thinking_tacc-Qwen3-32B, is a fine-tuned variant of the Qwen3-32B base model developed by Qwen. It has been specialized through training on a unique dataset: /scratch/08134/negin/hub/datasets--DCAgent--g1_timeout_e1_gpt_long_sampled_swesmith_psu_d1_original_40k_glm47_traces_thinking_preprocessed.

Key Training Details

The fine-tuning process involved specific hyperparameters:

Learning Rate: 4e-05
Optimizer: ADAMW_TORCH_FUSED with betas=(0.9, 0.999) and epsilon=1e-08
Batch Size: 1 (train), 8 (eval) with a total effective batch size of 32 across 32 devices
Epochs: 7.0
Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio

This specialized training indicates an optimization for tasks related to the specific characteristics of its training data, making it suitable for use cases aligned with the dataset's domain. The model leverages a 32 billion parameter architecture and supports a context length of 32768 tokens.

Overview

Model Overview

Key Training Details

Full Model Card (README)