Model Overview
DCAgent/b1_top16_seq is an 8 billion parameter language model, fine-tuned from the Qwen/Qwen3-8B base architecture. This model was specifically trained on the /scratch/08134/negin/hub/datasets--DCAgent--b1_top16_seq dataset, indicating a potential specialization for tasks aligned with the characteristics of this particular training data. The fine-tuning process involved a learning rate of 4e-05, a total training batch size of 16 across 16 devices, and was conducted for 7 epochs using the AdamW optimizer with a cosine learning rate scheduler.
Training Details
- Base Model: Qwen/Qwen3-8B
- Training Dataset:
/scratch/08134/negin/hub/datasets--DCAgent--b1_top16_seq - Learning Rate: 4e-05
- Optimizer: AdamW_Torch_Fused with betas=(0.9, 0.98) and epsilon=1e-08
- Epochs: 7.0
- Total Train Batch Size: 16 (across 16 GPUs)
- Context Length: 32768 tokens
Intended Use
While specific intended uses and limitations are not detailed in the provided information, the fine-tuning on a specialized dataset suggests its utility in applications related to that dataset's domain. Developers should evaluate its performance on tasks relevant to the training data for optimal results.