Name: DCAgent/g1_weighted_31600 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: DCAgent

Model Overview

DCAgent/g1_weighted_31600 is an 8 billion parameter language model derived from the Qwen/Qwen3-8B architecture. This model has undergone specific fine-tuning on a specialized dataset, /e/scratch/jureap59/raoof1/sft_data/hf_hub/datasets--DCAgent--g1_min_episodes_e1_weighted_top4_31600_glm47_traces/snapshots/a4717e999b7f8e9ad717b435f2d4a5cc75535932_thinking_preprocessed, suggesting an optimization for tasks aligned with the characteristics of this training data. It supports a substantial context length of 32768 tokens.

Training Details

The model was trained with the following key hyperparameters:

Learning Rate: 4e-05
Batch Size: 1 (train), 8 (eval)
Gradient Accumulation Steps: 2
Optimizer: AdamW Torch Fused with betas=(0.9, 0.98) and epsilon=1e-08
LR Scheduler: Cosine type with a warmup ratio of 0.1
Epochs: 7.0

The training utilized 48 devices and a total training batch size of 96, indicating a distributed training setup. The framework versions used include Transformers 4.57.6, Pytorch 2.9.1+cu130, Datasets 4.7.0, and Tokenizers 0.22.2.

Potential Use Cases

Given its fine-tuning on a specific dataset, this model is likely best suited for:

Applications requiring understanding or generation based on the patterns present in the g1_min_episodes_e1_weighted_top4_31600_glm47_traces dataset.
Tasks benefiting from a large context window (32768 tokens), allowing for processing of extensive inputs or maintaining long-term coherence.

Overview

Model Overview

Training Details

Potential Use Cases

Full Model Card (README)