jiayicheng/mix329_tillend_bc329
The jiayicheng/mix329_tillend_bc329 model is an 8 billion parameter language model fine-tuned from Qwen/Qwen3-8B. It was trained on the sft_fullteacher329_hf_same dataset with a context length of 32768 tokens. This model is a specialized iteration of the Qwen3-8B architecture, developed by jiayicheng, focusing on specific fine-tuning objectives.
Loading preview...
Model Overview
The jiayicheng/mix329_tillend_bc329 is an 8 billion parameter language model, derived from the Qwen/Qwen3-8B architecture. It has been specifically fine-tuned on the sft_fullteacher329_hf_same dataset, indicating a specialized training focus beyond the base model's capabilities. The model supports a substantial context length of 32768 tokens, allowing for processing and generating longer sequences of text.
Training Details
The fine-tuning process involved several key hyperparameters:
- Learning Rate: 4e-05
- Batch Size: 1 (train), 8 (eval)
- Gradient Accumulation Steps: 4
- Optimizer: ADAMW_TORCH_FUSED with betas=(0.9,0.98) and epsilon=1e-08
- LR Scheduler: Cosine type with a warmup ratio of 0.1
- Epochs: 7.0
This configuration suggests a thorough fine-tuning approach aimed at optimizing performance on its specific training data. The model was trained using a multi-GPU setup with 4 devices.
Intended Use Cases
Given its fine-tuned nature and the base Qwen3-8B architecture, this model is likely suitable for tasks aligned with the sft_fullteacher329_hf_same dataset's characteristics. Developers should evaluate its performance on tasks requiring deep contextual understanding and generation within its 32K token context window.