DCAgent/g1_weighted_31600
DCAgent/g1_weighted_31600 is an 8 billion parameter language model fine-tuned from Qwen/Qwen3-8B. This model was specifically trained on the /e/scratch/jureap59/raoof1/sft_data/hf_hub/datasets--DCAgent--g1_min_episodes_e1_weighted_top4_31600_glm47_traces/snapshots/a4717e999b7f8e9ad717b435f2d4a5cc75535932_thinking_preprocessed dataset, indicating a specialization in tasks related to its training data. It features a context length of 32768 tokens, making it suitable for processing extensive inputs.
Loading preview...
Model Overview
DCAgent/g1_weighted_31600 is an 8 billion parameter language model derived from the Qwen/Qwen3-8B architecture. This model has undergone specific fine-tuning on a specialized dataset, /e/scratch/jureap59/raoof1/sft_data/hf_hub/datasets--DCAgent--g1_min_episodes_e1_weighted_top4_31600_glm47_traces/snapshots/a4717e999b7f8e9ad717b435f2d4a5cc75535932_thinking_preprocessed, suggesting an optimization for tasks aligned with the characteristics of this training data. It supports a substantial context length of 32768 tokens.
Training Details
The model was trained with the following key hyperparameters:
- Learning Rate: 4e-05
- Batch Size: 1 (train), 8 (eval)
- Gradient Accumulation Steps: 2
- Optimizer: AdamW Torch Fused with betas=(0.9, 0.98) and epsilon=1e-08
- LR Scheduler: Cosine type with a warmup ratio of 0.1
- Epochs: 7.0
The training utilized 48 devices and a total training batch size of 96, indicating a distributed training setup. The framework versions used include Transformers 4.57.6, Pytorch 2.9.1+cu130, Datasets 4.7.0, and Tokenizers 0.22.2.
Potential Use Cases
Given its fine-tuning on a specific dataset, this model is likely best suited for:
- Applications requiring understanding or generation based on the patterns present in the
g1_min_episodes_e1_weighted_top4_31600_glm47_tracesdataset. - Tasks benefiting from a large context window (32768 tokens), allowing for processing of extensive inputs or maintaining long-term coherence.