Model Overview
FlyPig23/Qwen3-4B_Paper_Impact_SFT_1ep is a specialized 4 billion parameter language model derived from the Qwen/Qwen3-4B-Instruct-2507 architecture. It has been fine-tuned for a single epoch on the paper_impact_sft_train dataset, demonstrating a low validation loss of 0.0623.
Key Characteristics
- Base Model: Qwen3-4B-Instruct-2507, a robust foundation for instruction-following tasks.
- Parameter Count: 4 billion parameters, offering a balance between performance and computational efficiency.
- Context Length: Supports a substantial context window of 32768 tokens.
- Training Focus: Fine-tuned specifically on a dataset related to "paper impact," suggesting an optimization for tasks within this domain.
- Training Hyperparameters: Utilized a learning rate of 2e-05, a total training batch size of 64, and a cosine learning rate scheduler with a 0.1 warmup ratio.
Potential Use Cases
- Academic Research Analysis: Potentially useful for tasks involving the analysis or summarization of research paper impact.
- Specialized SFT Tasks: Suitable for applications requiring a model fine-tuned on specific supervised fine-tuning (SFT) datasets, particularly those similar to
paper_impact_sft_train.