jiayicheng/mix760_3step_bc760

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:May 7, 2026License:otherArchitecture:Transformer Cold

The jiayicheng/mix760_3step_bc760 model is an 8 billion parameter language model fine-tuned from Qwen/Qwen3-8B. This model was trained using a learning rate of 4e-05 over 7 epochs with a cosine learning rate scheduler. It is a specialized fine-tuned variant, though its specific primary differentiator and intended use cases require further information.

Loading preview...

Model Overview

This model, jiayicheng/mix760_3step_bc760, is an 8 billion parameter language model that has been fine-tuned from the base model Qwen/Qwen3-8B. The fine-tuning process utilized the sft_mixed760_official760 dataset.

Training Details

The model underwent 7 epochs of training with a learning rate of 4e-05. Key training hyperparameters included a train_batch_size of 1, gradient_accumulation_steps of 4, and an AdamW optimizer with specific beta and epsilon values. A cosine learning rate scheduler with a warmup ratio of 0.1 was employed. The training was distributed across 4 GPUs.

Capabilities & Use Cases

As a fine-tuned version of Qwen3-8B, this model is expected to inherit general language understanding and generation capabilities. However, the specific enhancements or specialized applications resulting from its fine-tuning on the sft_mixed760_official760 dataset are not detailed in the provided information. Users should consult further documentation for its intended uses and limitations.