DCAgent/g1_weighted_31600

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 19, 2026License:otherArchitecture:Transformer Cold

DCAgent/g1_weighted_31600 is an 8 billion parameter language model fine-tuned from Qwen/Qwen3-8B. This model was specifically trained on the /e/scratch/jureap59/raoof1/sft_data/hf_hub/datasets--DCAgent--g1_min_episodes_e1_weighted_top4_31600_glm47_traces/snapshots/a4717e999b7f8e9ad717b435f2d4a5cc75535932_thinking_preprocessed dataset, indicating a specialization in tasks related to its training data. It features a context length of 32768 tokens, making it suitable for processing extensive inputs.

Loading preview...

Model Overview

DCAgent/g1_weighted_31600 is an 8 billion parameter language model derived from the Qwen/Qwen3-8B architecture. This model has undergone specific fine-tuning on a specialized dataset, /e/scratch/jureap59/raoof1/sft_data/hf_hub/datasets--DCAgent--g1_min_episodes_e1_weighted_top4_31600_glm47_traces/snapshots/a4717e999b7f8e9ad717b435f2d4a5cc75535932_thinking_preprocessed, suggesting an optimization for tasks aligned with the characteristics of this training data. It supports a substantial context length of 32768 tokens.

Training Details

The model was trained with the following key hyperparameters:

  • Learning Rate: 4e-05
  • Batch Size: 1 (train), 8 (eval)
  • Gradient Accumulation Steps: 2
  • Optimizer: AdamW Torch Fused with betas=(0.9, 0.98) and epsilon=1e-08
  • LR Scheduler: Cosine type with a warmup ratio of 0.1
  • Epochs: 7.0

The training utilized 48 devices and a total training batch size of 96, indicating a distributed training setup. The framework versions used include Transformers 4.57.6, Pytorch 2.9.1+cu130, Datasets 4.7.0, and Tokenizers 0.22.2.

Potential Use Cases

Given its fine-tuning on a specific dataset, this model is likely best suited for:

  • Applications requiring understanding or generation based on the patterns present in the g1_min_episodes_e1_weighted_top4_31600_glm47_traces dataset.
  • Tasks benefiting from a large context window (32768 tokens), allowing for processing of extensive inputs or maintaining long-term coherence.