FlyPig23/Qwen3-4B_Paper_Impact_model_SFT_1ep
FlyPig23/Qwen3-4B_Paper_Impact_model_SFT_1ep is a fine-tuned version of the Qwen3-4B-Instruct-2507 model, developed by Qwen. This model has been specifically fine-tuned on the paper_impact_model_train dataset, achieving a loss of 0.0880 on its evaluation set. Its primary use case is likely related to tasks involving paper impact analysis or similar academic text processing, leveraging the base Qwen3-4B architecture for specialized performance.
Loading preview...
Qwen3-4B_Paper_Impact_model_SFT_1ep Overview
This model is a specialized fine-tuned variant of the Qwen3-4B-Instruct-2507 base model, developed by Qwen. It has undergone supervised fine-tuning (SFT) for 1 epoch using the paper_impact_model_train dataset, resulting in a reported loss of 0.0880 on its evaluation set.
Key Capabilities
- Specialized Fine-tuning: Tailored for tasks related to the
paper_impact_model_traindataset, suggesting potential applications in academic research analysis or impact assessment. - Qwen3-4B Base: Leverages the foundational capabilities of the Qwen3-4B-Instruct architecture.
Good for
- Researchers or developers working with datasets similar to
paper_impact_model_train. - Applications requiring a model with a specific focus on academic paper analysis or related text understanding.
Training Details
The model was trained with a learning rate of 2e-05, a batch size of 8 (total train batch size 64 across 4 GPUs with 2 gradient accumulation steps), and utilized the AdamW optimizer with a cosine learning rate scheduler. The training was performed using Transformers 4.57.1 and PyTorch 2.6.0+cu124.