Concyclics/PeoplesDaily-Qwen3-4B-Base
Concyclics/PeoplesDaily-Qwen3-4B-Base is a 4 billion parameter language model based on the Qwen3 architecture, featuring a 40960 token context length. It has undergone Supervised Fine-Tuning (SFT) on the Concyclics/PeoplesDaily dataset, demonstrating a training loss of 1.646 over 2 epochs. This model is optimized for tasks related to the specific domain covered by the PeoplesDaily dataset.
Loading preview...
Model Overview
Concyclics/PeoplesDaily-Qwen3-4B-Base is a 4 billion parameter language model built upon the Qwen3 architecture, designed with a substantial context length of 40960 tokens. This model has been specifically enhanced through Supervised Fine-Tuning (SFT) using the Concyclics/PeoplesDaily dataset.
Training Details
The SFT process involved:
- Batch Size: 96
- Epochs: 2
- Learning Rate: 1.0e-5
- LR Scheduler Type: Cosine
- Warmup Ratio: 0.1
- Total FLOPs: 483 TFlops
- Final Training Loss: 1.646
Potential Use Cases
Given its fine-tuning on the PeoplesDaily dataset, this model is likely well-suited for applications requiring:
- Processing and generating text related to news and public discourse, particularly from the PeoplesDaily domain.
- Tasks involving analysis or summarization of content similar to the training data.
- Applications where a model with a large context window and specific domain adaptation is beneficial.