DCAgent/g1_weighted_31600_8b_v2
DCAgent/g1_weighted_31600_8b_v2 is an 8 billion parameter language model developed by DCAgent, fine-tuned from Qwen/Qwen3-8B. This model was specifically trained on a weighted dataset derived from GLM47 traces, focusing on agentic reasoning and complex task execution. It is optimized for scenarios requiring nuanced understanding and strategic decision-making within a 32768 token context window.
Loading preview...
Model Overview
DCAgent/g1_weighted_31600_8b_v2 is an 8 billion parameter language model, fine-tuned from the robust Qwen/Qwen3-8B architecture. This model has undergone specialized training on a unique dataset, /e/scratch/jureap59/raoof1/sft_data/hf_hub/datasets--DCAgent--g1_min_episodes_e1_weighted_top4_31600_glm47_traces, which suggests a focus on agentic behaviors and complex reasoning tasks, likely involving trace-based learning from GLM47.
Training Details
The fine-tuning process utilized specific hyperparameters to achieve its specialized capabilities:
- Learning Rate: 2e-05
- Batch Size: 1 (train), 8 (eval)
- Gradient Accumulation: 2 steps, leading to an effective total train batch size of 96 across 48 devices.
- Optimizer: ADAMW_TORCH_FUSED with standard betas and epsilon.
- Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio over 5 epochs.
This configuration indicates a careful and resource-intensive training approach, aiming to imbue the model with enhanced performance on its target tasks. The model operates within a substantial 32768 token context window, allowing for processing of extensive inputs and generating coherent, contextually rich outputs.