FlyPig23/Qwen3-4B_Paper_Impact_patent_SFT_1ep
FlyPig23/Qwen3-4B_Paper_Impact_patent_SFT_1ep is a 4 billion parameter language model, fine-tuned from the Qwen/Qwen3-4B-Instruct-2507 architecture. This model is specifically optimized for tasks related to analyzing the impact of scientific papers and patents, demonstrating a low loss of 0.0589 on its evaluation set. It is designed for specialized applications requiring understanding and processing of academic and intellectual property documents.
Loading preview...
Model Overview
This model, named FlyPig23/Qwen3-4B_Paper_Impact_patent_SFT_1ep, is a 4 billion parameter language model. It is a fine-tuned iteration of the base model, Qwen/Qwen3-4B-Instruct-2507, specifically adapted for a niche domain.
Key Characteristics
- Base Model: Built upon the Qwen3-4B-Instruct-2507 architecture.
- Specialized Fine-tuning: The model has undergone supervised fine-tuning (SFT) using the
paper_impact_patents_traindataset, indicating a focus on academic and intellectual property content. - Performance Metric: Achieved a loss of 0.0589 on its evaluation set, suggesting effective learning within its specialized domain.
Training Details
The fine-tuning process involved specific hyperparameters:
- Learning Rate: 2e-05
- Batch Size: 8 (train and eval)
- Epochs: 1.0
- Optimizer: AdamW with default betas and epsilon.
- Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio.
Intended Use Cases
This model is particularly suited for applications that involve:
- Analyzing the impact of scientific papers.
- Processing and understanding patent documents.
- Tasks requiring domain-specific knowledge related to research and intellectual property.