DCAgent/d1_hardened_top4_seq_glm47

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Apr 12, 2026License:otherArchitecture:Transformer Cold

DCAgent/d1_hardened_top4_seq_glm47 is an 8 billion parameter language model fine-tuned from Qwen/Qwen3-8B. This model was specifically trained on the /e/scratch/jureap59/raoof1/sft_data/hf_hub/datasets--DCAgent--d1_hardened_top4_seq_glm47_traces dataset, suggesting a specialization in processing or generating sequences related to its training data. With a 32768 token context length, it is designed for tasks requiring extensive contextual understanding.

Loading preview...

Model Overview

DCAgent/d1_hardened_top4_seq_glm47 is an 8 billion parameter language model, fine-tuned from the robust Qwen/Qwen3-8B architecture. This model has been specialized through training on the /e/scratch/jureap59/raoof1/sft_data/hf_hub/datasets--DCAgent--d1_hardened_top4_seq_glm47_traces dataset, indicating a focus on specific sequential data processing or generation tasks.

Key Training Details

The fine-tuning process involved several key hyperparameters:

  • Learning Rate: 4e-05
  • Optimizer: ADAMW_TORCH_FUSED with betas=(0.9, 0.98) and epsilon=1e-08
  • Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio
  • Epochs: 7.0
  • Batch Size: A total training batch size of 16 across 16 devices.

Potential Use Cases

Given its specialized training, this model is likely suitable for applications that align with the characteristics of the d1_hardened_top4_seq_glm47_traces dataset. Developers should evaluate its performance on tasks requiring deep understanding or generation of similar sequential data.