Name: jiayicheng/mix329_tillend_bc329 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jiayicheng

Model Overview

The jiayicheng/mix329_tillend_bc329 is an 8 billion parameter language model, derived from the Qwen/Qwen3-8B architecture. It has been specifically fine-tuned on the sft_fullteacher329_hf_same dataset, indicating a specialized training focus beyond the base model's capabilities. The model supports a substantial context length of 32768 tokens, allowing for processing and generating longer sequences of text.

Training Details

The fine-tuning process involved several key hyperparameters:

Learning Rate: 4e-05
Batch Size: 1 (train), 8 (eval)
Gradient Accumulation Steps: 4
Optimizer: ADAMW_TORCH_FUSED with betas=(0.9,0.98) and epsilon=1e-08
LR Scheduler: Cosine type with a warmup ratio of 0.1
Epochs: 7.0

This configuration suggests a thorough fine-tuning approach aimed at optimizing performance on its specific training data. The model was trained using a multi-GPU setup with 4 devices.

Intended Use Cases

Given its fine-tuned nature and the base Qwen3-8B architecture, this model is likely suitable for tasks aligned with the sft_fullteacher329_hf_same dataset's characteristics. Developers should evaluate its performance on tasks requiring deep contextual understanding and generation within its 32K token context window.

Overview

Model Overview

Training Details

Intended Use Cases

Full Model Card (README)