Overview
DCAgent/b1_top2_seq is an 8 billion parameter language model, fine-tuned from the base Qwen/Qwen3-8B architecture. The model underwent training on a specific dataset, /scratch/08134/negin/hub/datasets--DCAgent--b1_top2_seq, indicating a potential specialization for tasks aligned with this data.
Training Details
The fine-tuning process utilized a learning rate of 4e-05, a training batch size of 1, and an evaluation batch size of 8. Training was conducted over 7 epochs using a multi-GPU setup with 16 devices, resulting in a total training batch size of 16. The optimizer used was ADAMW_TORCH_FUSED with specific beta and epsilon values, and a cosine learning rate scheduler with a 0.1 warmup ratio was applied. The model was developed using Transformers 4.57.3, Pytorch 2.9.0+cu128, Datasets 4.4.1, and Tokenizers 0.22.1.
Intended Uses & Limitations
Specific intended uses and limitations are not detailed in the provided information, suggesting further exploration of the DCAgent--b1_top2_seq dataset would be necessary to understand its optimal applications.