DCAgent/b1_top32 is an 8 billion parameter language model fine-tuned from Qwen/Qwen3-8B. This model was trained on a specific dataset, /scratch/08134/negin/hub/datasets--DCAgent--b1_top32/snapshots/672f249bde596b1bd0c44d2ba49e33deda128ebd, and features a 32768 token context length. Its specific optimizations and primary use cases are not detailed in the provided information, suggesting it is a foundational or general-purpose fine-tune.
Loading preview...
Model Overview
DCAgent/b1_top32 is an 8 billion parameter language model, fine-tuned from the Qwen/Qwen3-8B architecture. It was trained using a specific dataset located at /scratch/08134/negin/hub/datasets--DCAgent--b1_top32/snapshots/672f249bde596b1bd0c44d2ba49e33deda128ebd.
Training Details
The model was trained with a learning rate of 4e-05, a train batch size of 1, and an eval batch size of 8. It utilized 16 devices for distributed training, resulting in a total train batch size of 16 and a total eval batch size of 128. The optimizer used was ADAMW_TORCH_FUSED with specific beta and epsilon values, and a cosine learning rate scheduler with a warmup ratio of 0.1 was applied over 7 epochs. The training environment included Transformers 4.57.3, Pytorch 2.9.0+cu128, Datasets 4.4.1, and Tokenizers 0.22.1.
Key Characteristics
- Base Model: Qwen3-8B
- Parameter Count: 8 billion
- Context Length: 32768 tokens
Further details regarding its specific intended uses, limitations, and performance benchmarks are not provided in the available documentation.