DCAgent/b1_top1 is an 8 billion parameter language model fine-tuned from Qwen/Qwen3-8B. This model is specifically adapted from the Qwen3-8B architecture, focusing on specialized performance through fine-tuning on the /e/scratch/jureap59/raoof1/sft_data/hf_hub/datasets--DCAgent--b1_top1/snapshots/b2309d14459711bdc32a92285257bc916445bbdc_thinking_preprocessed dataset. It leverages a 32768 token context length, making it suitable for tasks requiring extensive contextual understanding within its fine-tuned domain.
Loading preview...
Model Overview
DCAgent/b1_top1 is an 8 billion parameter language model, fine-tuned from the robust Qwen/Qwen3-8B base architecture. This model has undergone specialized training on a unique dataset, /e/scratch/jureap59/raoof1/sft_data/hf_hub/datasets--DCAgent--b1_top1/snapshots/b2309d14459711bdc32a92285257bc916445bbdc_thinking_preprocessed, indicating a focus on particular tasks or domains. It supports a substantial context length of 32768 tokens, allowing for processing and understanding of lengthy inputs.
Training Details
The fine-tuning process utilized specific hyperparameters to optimize performance:
- Learning Rate: 4e-05
- Batch Sizes: 1 (train), 8 (eval)
- Optimizer: ADAMW_TORCH_FUSED with
betas=(0.9,0.98)andepsilon=1e-08 - LR Scheduler: Cosine type with a warmup ratio of 0.1
- Epochs: 7.0
- Distributed Training: Multi-GPU setup across 16 devices.
This configuration suggests a thorough fine-tuning approach aimed at adapting the base Qwen3-8B model for specific applications, likely within the domain of the custom training dataset. While specific intended uses and limitations are not detailed in the provided information, its fine-tuned nature implies enhanced performance for tasks aligned with its training data.