Concyclics/PeoplesDaily-Qwen3-4B-Base

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kLicense:apache-2.0Architecture:Transformer0.0K Open Weights Warm

Concyclics/PeoplesDaily-Qwen3-4B-Base is a 4 billion parameter language model based on the Qwen3 architecture, featuring a 40960 token context length. It has undergone Supervised Fine-Tuning (SFT) on the Concyclics/PeoplesDaily dataset, demonstrating a training loss of 1.646 over 2 epochs. This model is optimized for tasks related to the specific domain covered by the PeoplesDaily dataset.

Loading preview...

Model Overview

Concyclics/PeoplesDaily-Qwen3-4B-Base is a 4 billion parameter language model built upon the Qwen3 architecture, designed with a substantial context length of 40960 tokens. This model has been specifically enhanced through Supervised Fine-Tuning (SFT) using the Concyclics/PeoplesDaily dataset.

Training Details

The SFT process involved:

  • Batch Size: 96
  • Epochs: 2
  • Learning Rate: 1.0e-5
  • LR Scheduler Type: Cosine
  • Warmup Ratio: 0.1
  • Total FLOPs: 483 TFlops
  • Final Training Loss: 1.646

Potential Use Cases

Given its fine-tuning on the PeoplesDaily dataset, this model is likely well-suited for applications requiring:

  • Processing and generating text related to news and public discourse, particularly from the PeoplesDaily domain.
  • Tasks involving analysis or summarization of content similar to the training data.
  • Applications where a model with a large context window and specific domain adaptation is beneficial.