jiayicheng/mix329_tillend_bc329

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:May 7, 2026License:otherArchitecture:Transformer Cold

The jiayicheng/mix329_tillend_bc329 model is an 8 billion parameter language model fine-tuned from Qwen/Qwen3-8B. It was trained on the sft_fullteacher329_hf_same dataset with a context length of 32768 tokens. This model is a specialized iteration of the Qwen3-8B architecture, developed by jiayicheng, focusing on specific fine-tuning objectives.

Loading preview...

Model Overview

The jiayicheng/mix329_tillend_bc329 is an 8 billion parameter language model, derived from the Qwen/Qwen3-8B architecture. It has been specifically fine-tuned on the sft_fullteacher329_hf_same dataset, indicating a specialized training focus beyond the base model's capabilities. The model supports a substantial context length of 32768 tokens, allowing for processing and generating longer sequences of text.

Training Details

The fine-tuning process involved several key hyperparameters:

  • Learning Rate: 4e-05
  • Batch Size: 1 (train), 8 (eval)
  • Gradient Accumulation Steps: 4
  • Optimizer: ADAMW_TORCH_FUSED with betas=(0.9,0.98) and epsilon=1e-08
  • LR Scheduler: Cosine type with a warmup ratio of 0.1
  • Epochs: 7.0

This configuration suggests a thorough fine-tuning approach aimed at optimizing performance on its specific training data. The model was trained using a multi-GPU setup with 4 devices.

Intended Use Cases

Given its fine-tuned nature and the base Qwen3-8B architecture, this model is likely suitable for tasks aligned with the sft_fullteacher329_hf_same dataset's characteristics. Developers should evaluate its performance on tasks requiring deep contextual understanding and generation within its 32K token context window.