Name: DCAgent/g1_min_episodes_e1_gpt_long_thinking_tacc-Qwen3-32B API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: DCAgent

Overview

DCAgent/g1_min_episodes_e1_gpt_long_thinking_tacc-Qwen3-32B is a 32 billion parameter language model derived from the Qwen3-32B architecture. It has been specifically fine-tuned on a unique dataset, /scratch/08134/negin/hub/datasets--DCAgent--g1_min_episodes_e1_gpt_long_d1_original_40k_glm47_traces/snapshots/ba6a4708542f411ad2122152b3c153c71a12e458_thinking_preprocessed, indicating a specialization for tasks related to the content and structure of this training data. The model supports a substantial context length of 32768 tokens.

Training Details

The fine-tuning process involved specific hyperparameters:

Learning Rate: 4e-05
Batch Size: 1 (train), 8 (eval)
Optimizer: ADAMW_TORCH_FUSED with betas=(0.9, 0.999) and epsilon=1e-08
LR Scheduler: Cosine type with a warmup ratio of 0.1
Epochs: 7.0
Distributed Training: Utilized 32 devices for a total train batch size of 32 and eval batch size of 256.

Key Characteristics

Base Model: Qwen3-32B
Parameter Count: 32 billion
Context Window: 32768 tokens
Specialization: Fine-tuned on a specific dataset, suggesting tailored performance for tasks aligned with that data.

Intended Use Cases

Given its fine-tuning on a specialized dataset, this model is likely best suited for applications that align with the characteristics and domain of the /scratch/08134/negin/hub/datasets--DCAgent--g1_min_episodes_e1_gpt_long_d1_original_40k_glm47_traces/snapshots/ba6a4708542f411ad2122152b3c153c71a12e458_thinking_preprocessed dataset. Users should evaluate its performance on tasks similar to the fine-tuning data.

Overview

Overview

Training Details

Key Characteristics

Intended Use Cases

Full Model Card (README)