Model Overview
DCAgent/b1_top32_seq is an 8 billion parameter language model, fine-tuned from the Qwen/Qwen3-8B base model. It was specifically trained on the /scratch/08134/negin/hub/datasets--DCAgent--b1_top32_seq dataset, indicating a potential specialization for tasks related to the characteristics of this particular dataset.
Key Training Details
- Base Model: Qwen/Qwen3-8B
- Dataset:
/scratch/08134/negin/hub/datasets--DCAgent--b1_top32_seq - Learning Rate: 4e-05
- Optimizer: AdamW (fused) with betas=(0.9, 0.98) and epsilon=1e-08
- Scheduler: Cosine learning rate scheduler with 0.1 warmup ratio
- Epochs: 7.0
- Batch Size: 16 (total train batch size across 16 devices)
Intended Use & Limitations
While specific details on intended uses and limitations are not provided in the original model card, its fine-tuning on a particular dataset suggests it may excel in tasks aligned with that dataset's domain. Users should evaluate its performance for their specific applications, especially given the lack of explicit guidance on its optimal use cases or known constraints.