DCAgent/b1_top16_seq

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 7, 2026License:otherArchitecture:Transformer Warm

DCAgent/b1_top16_seq is an 8 billion parameter causal language model fine-tuned from Qwen/Qwen3-8B. This model was trained on a specific dataset, /scratch/08134/negin/hub/datasets--DCAgent--b1_top16_seq, suggesting specialization for tasks related to that dataset's domain. With a context length of 32768 tokens, it is designed for applications requiring processing of extensive input sequences.

Loading preview...

Model Overview

DCAgent/b1_top16_seq is an 8 billion parameter language model, fine-tuned from the Qwen/Qwen3-8B base architecture. This model was specifically trained on the /scratch/08134/negin/hub/datasets--DCAgent--b1_top16_seq dataset, indicating a potential specialization for tasks aligned with the characteristics of this particular training data. The fine-tuning process involved a learning rate of 4e-05, a total training batch size of 16 across 16 devices, and was conducted for 7 epochs using the AdamW optimizer with a cosine learning rate scheduler.

Training Details

  • Base Model: Qwen/Qwen3-8B
  • Training Dataset: /scratch/08134/negin/hub/datasets--DCAgent--b1_top16_seq
  • Learning Rate: 4e-05
  • Optimizer: AdamW_Torch_Fused with betas=(0.9, 0.98) and epsilon=1e-08
  • Epochs: 7.0
  • Total Train Batch Size: 16 (across 16 GPUs)
  • Context Length: 32768 tokens

Intended Use

While specific intended uses and limitations are not detailed in the provided information, the fine-tuning on a specialized dataset suggests its utility in applications related to that dataset's domain. Developers should evaluate its performance on tasks relevant to the training data for optimal results.