FlyPig23/Qwen3-4B_Paper_Impact_SFT_1ep

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Apr 7, 2026License:otherArchitecture:Transformer Warm

FlyPig23/Qwen3-4B_Paper_Impact_SFT_1ep is a 4 billion parameter language model, fine-tuned from the Qwen/Qwen3-4B-Instruct-2507 base model. This model was specifically trained for one epoch on the paper_impact_sft_train dataset, achieving a validation loss of 0.0623. It is optimized for tasks related to paper impact analysis, leveraging its Qwen3 architecture and a 32768 token context length.

Loading preview...

Model Overview

FlyPig23/Qwen3-4B_Paper_Impact_SFT_1ep is a specialized 4 billion parameter language model derived from the Qwen/Qwen3-4B-Instruct-2507 architecture. It has been fine-tuned for a single epoch on the paper_impact_sft_train dataset, demonstrating a low validation loss of 0.0623.

Key Characteristics

  • Base Model: Qwen3-4B-Instruct-2507, a robust foundation for instruction-following tasks.
  • Parameter Count: 4 billion parameters, offering a balance between performance and computational efficiency.
  • Context Length: Supports a substantial context window of 32768 tokens.
  • Training Focus: Fine-tuned specifically on a dataset related to "paper impact," suggesting an optimization for tasks within this domain.
  • Training Hyperparameters: Utilized a learning rate of 2e-05, a total training batch size of 64, and a cosine learning rate scheduler with a 0.1 warmup ratio.

Potential Use Cases

  • Academic Research Analysis: Potentially useful for tasks involving the analysis or summarization of research paper impact.
  • Specialized SFT Tasks: Suitable for applications requiring a model fine-tuned on specific supervised fine-tuning (SFT) datasets, particularly those similar to paper_impact_sft_train.