DCAgent/b1_top8

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 7, 2026License:otherArchitecture:Transformer Cold

DCAgent/b1_top8 is an 8 billion parameter language model fine-tuned from Qwen/Qwen3-8B. This model is specifically fine-tuned on the DCAgent/b1_top8 dataset, indicating a specialization for tasks related to its training data. It leverages a 32,768 token context length, making it suitable for processing extensive inputs and generating detailed responses.

Loading preview...

Overview

DCAgent/b1_top8 is an 8 billion parameter language model, fine-tuned from the base Qwen/Qwen3-8B architecture. This model has been specifically trained on the /e/scratch/jureap59/raoof1/sft_data/hf_hub/datasets--DCAgent--b1_top8/snapshots/0261a53fb1e70a7ba1767f28710756d33ed1048e_thinking_preprocessed dataset. The fine-tuning process involved a learning rate of 4e-05, a total training batch size of 16 across 16 GPUs, and 7 epochs, utilizing an AdamW optimizer with a cosine learning rate scheduler.

Key Characteristics

  • Base Model: Qwen3-8B
  • Parameter Count: 8 billion
  • Context Length: 32,768 tokens
  • Training Data: Fine-tuned on a specific dataset (DCAgent/b1_top8_thinking_preprocessed), suggesting specialized capabilities related to this data.

Training Details

The model was trained using:

  • Learning Rate: 4e-05
  • Optimizer: AdamW_TORCH_FUSED
  • LR Scheduler: Cosine with 0.1 warmup ratio
  • Epochs: 7.0
  • Frameworks: Transformers 4.57.6, Pytorch 2.9.1+cu130, Datasets 4.7.0, Tokenizers 0.22.2.

Potential Use Cases

Given its fine-tuning on a specific dataset, this model is likely best suited for tasks that align with the characteristics and domain of the DCAgent/b1_top8_thinking_preprocessed data. Developers should evaluate its performance on tasks requiring deep understanding or generation within that specialized domain.