DCAgent/g1_weighted_31600_gradnorm01

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 20, 2026License:otherArchitecture:Transformer Cold

DCAgent/g1_weighted_31600_gradnorm01 is an 8 billion parameter causal language model fine-tuned from Qwen/Qwen3-8B. This model was specifically trained on a dataset derived from 'g1_min_episodes_e1_weighted_top4_31600_glm47_traces_thinking_preprocessed', suggesting an optimization for agentic reasoning or complex task execution. With a 32K context length, it is likely designed for processing extensive conversational histories or detailed instructions in specialized applications.

Loading preview...

Model Overview

DCAgent/g1_weighted_31600_gradnorm01 is an 8 billion parameter language model, fine-tuned from the base Qwen/Qwen3-8B architecture. This model has been specialized through training on a unique dataset, /e/scratch/jureap59/raoof1/sft_data/hf_hub/datasets--DCAgent--g1_min_episodes_e1_weighted_top4_31600_glm47_traces/snapshots/a4717e999b7f8e9ad717b435f2d4a5cc75535932_thinking_preprocessed.

Training Details

The fine-tuning process utilized specific hyperparameters to achieve its specialized capabilities:

  • Learning Rate: 4e-05
  • Batch Size: 1 (train), 8 (eval)
  • Gradient Accumulation: 2 steps
  • Optimizer: AdamW_Torch_Fused with betas=(0.9, 0.98)
  • Scheduler: Cosine learning rate scheduler with 0.1 warmup ratio
  • Epochs: 7.0

This configuration, combined with training on a dataset focused on 'thinking_preprocessed' traces, indicates an intent to enhance the model's ability in complex reasoning, planning, or agentic behaviors. The model supports a context length of 32,768 tokens, making it suitable for tasks requiring extensive input understanding.