DCAgent/b1_top16

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 7, 2026License:otherArchitecture:Transformer Warm

DCAgent/b1_top16 is an 8 billion parameter language model fine-tuned from Qwen/Qwen3-8B. This model was trained on a specific dataset, /e/scratch/jureap59/raoof1/sft_data/hf_hub/datasets--DCAgent--b1_top16/snapshots/2be82814777f95e38b73694deed12e34f91ca466_thinking_preprocessed, with a context length of 32768 tokens. It is optimized for tasks related to its fine-tuning data, suggesting specialized performance in areas covered by that dataset. The model leverages a cosine learning rate scheduler and AdamW_Torch_Fused optimizer over 7 epochs.

Loading preview...

DCAgent/b1_top16: Fine-tuned Qwen3-8B Model

DCAgent/b1_top16 is an 8 billion parameter language model derived from the Qwen/Qwen3-8B architecture. This model has undergone specific fine-tuning on a unique dataset, /e/scratch/jureap59/raoof1/sft_data/hf_hub/datasets--DCAgent--b1_top16/snapshots/2be82814777f95e38b73694deed12e34f91ca466_thinking_preprocessed, indicating a specialization for tasks aligned with this data.

Key Training Details

  • Base Model: Qwen/Qwen3-8B
  • Parameter Count: 8 billion
  • Context Length: 32768 tokens
  • Learning Rate: 4e-05
  • Optimizer: AdamW_Torch_Fused with betas=(0.9, 0.98) and epsilon=1e-08
  • Scheduler: Cosine learning rate scheduler with 0.1 warmup ratio
  • Epochs: 7.0
  • Batch Size: 1 (train), 8 (eval) with a total effective batch size of 16 (train) and 128 (eval) across 16 devices.

Potential Use Cases

Given its fine-tuning on a specific dataset, this model is likely best suited for applications that align with the characteristics and content of the b1_top16_thinking_preprocessed dataset. Developers should investigate the nature of this dataset to determine optimal use cases. Its 32K context window allows for processing longer inputs relevant to its specialized domain.