DCAgent/b1_top8_seq
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 7, 2026License:otherArchitecture:Transformer Cold

DCAgent/b1_top8_seq is an 8 billion parameter language model fine-tuned from Qwen/Qwen3-8B. This model was trained on the /scratch/08134/negin/hub/datasets--DCAgent--b1_top8_seq/snapshots/431317fbde90fded83a2730a01e3e4bcc5981bd2 dataset. Its specific optimizations and primary use cases are not detailed in the provided information.

Loading preview...

Overview

DCAgent/b1_top8_seq is an 8 billion parameter language model, fine-tuned from the base model Qwen/Qwen3-8B. The fine-tuning process utilized the dataset located at /scratch/08134/negin/hub/datasets--DCAgent--b1_top8_seq/snapshots/431317fbde90fded83a2730a01e3e4bcc5981bd2.

Training Details

The model was trained with the following key hyperparameters:

  • Learning Rate: 4e-05
  • Optimizer: ADAMW_TORCH_FUSED with betas=(0.9, 0.98) and epsilon=1e-08
  • Batch Size: A total training batch size of 16 (1 per device across 16 GPUs)
  • Epochs: 7.0
  • LR Scheduler: Cosine type with a warmup ratio of 0.1

Limitations

Specific details regarding the model's intended uses, limitations, and the nature of its training and evaluation data are not provided in the available documentation. Users should exercise caution and conduct further investigation to determine its suitability for specific applications.