FlyPig23/Qwen3-4B_Paper_Impact_model_SFT_1ep

TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Apr 7, 2026License:otherArchitecture:Transformer Cold

FlyPig23/Qwen3-4B_Paper_Impact_model_SFT_1ep is a fine-tuned version of the Qwen3-4B-Instruct-2507 model, developed by Qwen. This model has been specifically fine-tuned on the paper_impact_model_train dataset, achieving a loss of 0.0880 on its evaluation set. Its primary use case is likely related to tasks involving paper impact analysis or similar academic text processing, leveraging the base Qwen3-4B architecture for specialized performance.

Loading preview...

Qwen3-4B_Paper_Impact_model_SFT_1ep Overview

This model is a specialized fine-tuned variant of the Qwen3-4B-Instruct-2507 base model, developed by Qwen. It has undergone supervised fine-tuning (SFT) for 1 epoch using the paper_impact_model_train dataset, resulting in a reported loss of 0.0880 on its evaluation set.

Key Capabilities

  • Specialized Fine-tuning: Tailored for tasks related to the paper_impact_model_train dataset, suggesting potential applications in academic research analysis or impact assessment.
  • Qwen3-4B Base: Leverages the foundational capabilities of the Qwen3-4B-Instruct architecture.

Good for

  • Researchers or developers working with datasets similar to paper_impact_model_train.
  • Applications requiring a model with a specific focus on academic paper analysis or related text understanding.

Training Details

The model was trained with a learning rate of 2e-05, a batch size of 8 (total train batch size 64 across 4 GPUs with 2 gradient accumulation steps), and utilized the AdamW optimizer with a cosine learning rate scheduler. The training was performed using Transformers 4.57.1 and PyTorch 2.6.0+cu124.