DCAgent/d1_hardened_top4_seq_glm47
DCAgent/d1_hardened_top4_seq_glm47 is an 8 billion parameter language model fine-tuned from Qwen/Qwen3-8B. This model was specifically trained on the /e/scratch/jureap59/raoof1/sft_data/hf_hub/datasets--DCAgent--d1_hardened_top4_seq_glm47_traces dataset, suggesting a specialization in processing or generating sequences related to its training data. With a 32768 token context length, it is designed for tasks requiring extensive contextual understanding.
Loading preview...
Model Overview
DCAgent/d1_hardened_top4_seq_glm47 is an 8 billion parameter language model, fine-tuned from the robust Qwen/Qwen3-8B architecture. This model has been specialized through training on the /e/scratch/jureap59/raoof1/sft_data/hf_hub/datasets--DCAgent--d1_hardened_top4_seq_glm47_traces dataset, indicating a focus on specific sequential data processing or generation tasks.
Key Training Details
The fine-tuning process involved several key hyperparameters:
- Learning Rate: 4e-05
- Optimizer: ADAMW_TORCH_FUSED with betas=(0.9, 0.98) and epsilon=1e-08
- Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio
- Epochs: 7.0
- Batch Size: A total training batch size of 16 across 16 devices.
Potential Use Cases
Given its specialized training, this model is likely suitable for applications that align with the characteristics of the d1_hardened_top4_seq_glm47_traces dataset. Developers should evaluate its performance on tasks requiring deep understanding or generation of similar sequential data.