Name: DCAgent/b1_top16_seq API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: DCAgent

Model Overview

DCAgent/b1_top16_seq is an 8 billion parameter language model, fine-tuned from the Qwen/Qwen3-8B base architecture. This model was specifically trained on the /scratch/08134/negin/hub/datasets--DCAgent--b1_top16_seq dataset, indicating a potential specialization for tasks aligned with the characteristics of this particular training data. The fine-tuning process involved a learning rate of 4e-05, a total training batch size of 16 across 16 devices, and was conducted for 7 epochs using the AdamW optimizer with a cosine learning rate scheduler.

Training Details

Base Model: Qwen/Qwen3-8B
Training Dataset: /scratch/08134/negin/hub/datasets--DCAgent--b1_top16_seq
Learning Rate: 4e-05
Optimizer: AdamW_Torch_Fused with betas=(0.9, 0.98) and epsilon=1e-08
Epochs: 7.0
Total Train Batch Size: 16 (across 16 GPUs)
Context Length: 32768 tokens

Intended Use

While specific intended uses and limitations are not detailed in the provided information, the fine-tuning on a specialized dataset suggests its utility in applications related to that dataset's domain. Developers should evaluate its performance on tasks relevant to the training data for optimal results.

Overview

Model Overview

Training Details

Intended Use

Full Model Card (README)