DCAgent/b1_top32
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 7, 2026License:otherArchitecture:Transformer Cold

DCAgent/b1_top32 is an 8 billion parameter language model fine-tuned from Qwen/Qwen3-8B. This model was trained on a specific dataset, /scratch/08134/negin/hub/datasets--DCAgent--b1_top32/snapshots/672f249bde596b1bd0c44d2ba49e33deda128ebd, and features a 32768 token context length. Its specific optimizations and primary use cases are not detailed in the provided information, suggesting it is a foundational or general-purpose fine-tune.

Loading preview...

Model Overview

DCAgent/b1_top32 is an 8 billion parameter language model, fine-tuned from the Qwen/Qwen3-8B architecture. It was trained using a specific dataset located at /scratch/08134/negin/hub/datasets--DCAgent--b1_top32/snapshots/672f249bde596b1bd0c44d2ba49e33deda128ebd.

Training Details

The model was trained with a learning rate of 4e-05, a train batch size of 1, and an eval batch size of 8. It utilized 16 devices for distributed training, resulting in a total train batch size of 16 and a total eval batch size of 128. The optimizer used was ADAMW_TORCH_FUSED with specific beta and epsilon values, and a cosine learning rate scheduler with a warmup ratio of 0.1 was applied over 7 epochs. The training environment included Transformers 4.57.3, Pytorch 2.9.0+cu128, Datasets 4.4.1, and Tokenizers 0.22.1.

Key Characteristics

  • Base Model: Qwen3-8B
  • Parameter Count: 8 billion
  • Context Length: 32768 tokens

Further details regarding its specific intended uses, limitations, and performance benchmarks are not provided in the available documentation.