FlyPig23/Qwen3-4B_Paper_Impact_media_SFT_1ep

TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Apr 7, 2026License:otherArchitecture:Transformer Cold

FlyPig23/Qwen3-4B_Paper_Impact_media_SFT_1ep is a 4 billion parameter Qwen3-based causal language model fine-tuned from Qwen/Qwen3-4B-Instruct-2507. This model is specifically fine-tuned on the 'paper_impact_media_train' dataset, indicating an optimization for tasks related to analyzing or generating content concerning the impact of papers or media. It features a 32768 token context length and was trained for one epoch, achieving a loss of 0.0574 on the evaluation set.

Loading preview...

Overview

FlyPig23/Qwen3-4B_Paper_Impact_media_SFT_1ep is a 4 billion parameter language model built upon the Qwen3 architecture. It is a fine-tuned variant of the base model Qwen/Qwen3-4B-Instruct-2507, specifically adapted through supervised fine-tuning (SFT) for one epoch.

Key Characteristics

  • Base Model: Qwen3-4B-Instruct-2507
  • Parameter Count: 4 billion parameters
  • Context Length: 32768 tokens
  • Fine-tuning Dataset: paper_impact_media_train
  • Training Performance: Achieved a loss of 0.0574 on the evaluation set during training.
  • Training Hyperparameters: Utilized a learning rate of 2e-05, a total batch size of 64 (with gradient accumulation), and the AdamW optimizer with a cosine learning rate scheduler.

Intended Use Cases

This model is specifically fine-tuned on a dataset related to 'paper impact media'. While specific details on its intended uses and limitations are not extensively provided in the README, its training data suggests potential applications in:

  • Analyzing the impact or reception of academic papers or media content.
  • Generating summaries or insights related to research dissemination.
  • Tasks requiring understanding or creation of content within the domain of academic or media influence.

Limitations

The README explicitly states that more information is needed regarding the model's intended uses and limitations. Users should exercise caution and conduct thorough evaluations for specific applications, as the full scope of its capabilities and potential biases is not yet detailed.