DCAgent/g1_clean_hybrid_25k_8b

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 23, 2026License:otherArchitecture:Transformer Cold

DCAgent/g1_clean_hybrid_25k_8b is an 8 billion parameter language model fine-tuned from Qwen/Qwen3-8B. This model was specifically trained on the g1_clean_hybrid_scaffold_25k_glm47_traces dataset. It is designed for tasks related to its specialized training data, offering enhanced performance in areas covered by that dataset. The model has a context length of 32768 tokens.

Loading preview...

Model Overview

DCAgent/g1_clean_hybrid_25k_8b is an 8 billion parameter language model, fine-tuned from the base Qwen/Qwen3-8B architecture. This model has undergone specialized training on the /e/scratch/jureap59/raoof1/sft_data/hf_hub/datasets--DCAgent--g1_clean_hybrid_scaffold_25k_glm47_traces/snapshots/ad622359a4cfbac08ec8e7bbe09f4f41a72a1834_thinking_preprocessed dataset.

Training Details

The fine-tuning process utilized specific hyperparameters to optimize performance:

  • Base Model: Qwen/Qwen3-8B
  • Learning Rate: 4e-05
  • Batch Size: 1 (train), 8 (eval)
  • Gradient Accumulation: 2 steps
  • Total Training Batch Size: 96
  • Optimizer: ADAMW_TORCH_FUSED with betas=(0.9, 0.98) and epsilon=1e-08
  • LR Scheduler: Cosine with 0.1 warmup ratio
  • Epochs: 7.0
  • Devices: 48 multi-GPU setup

Intended Use Cases

Given its fine-tuning on a specific dataset, this model is best suited for applications that align with the characteristics and content of the g1_clean_hybrid_scaffold_25k_glm47_traces dataset. Developers should consider its specialized training for tasks requiring nuanced understanding or generation within that domain. The model supports a context length of 32768 tokens.