DCAgent/g1_weighted_31600_gradnorm01
DCAgent/g1_weighted_31600_gradnorm01 is an 8 billion parameter causal language model fine-tuned from Qwen/Qwen3-8B. This model was specifically trained on a dataset derived from 'g1_min_episodes_e1_weighted_top4_31600_glm47_traces_thinking_preprocessed', suggesting an optimization for agentic reasoning or complex task execution. With a 32K context length, it is likely designed for processing extensive conversational histories or detailed instructions in specialized applications.
Loading preview...
Model Overview
DCAgent/g1_weighted_31600_gradnorm01 is an 8 billion parameter language model, fine-tuned from the base Qwen/Qwen3-8B architecture. This model has been specialized through training on a unique dataset, /e/scratch/jureap59/raoof1/sft_data/hf_hub/datasets--DCAgent--g1_min_episodes_e1_weighted_top4_31600_glm47_traces/snapshots/a4717e999b7f8e9ad717b435f2d4a5cc75535932_thinking_preprocessed.
Training Details
The fine-tuning process utilized specific hyperparameters to achieve its specialized capabilities:
- Learning Rate: 4e-05
- Batch Size: 1 (train), 8 (eval)
- Gradient Accumulation: 2 steps
- Optimizer: AdamW_Torch_Fused with betas=(0.9, 0.98)
- Scheduler: Cosine learning rate scheduler with 0.1 warmup ratio
- Epochs: 7.0
This configuration, combined with training on a dataset focused on 'thinking_preprocessed' traces, indicates an intent to enhance the model's ability in complex reasoning, planning, or agentic behaviors. The model supports a context length of 32,768 tokens, making it suitable for tasks requiring extensive input understanding.