FlyPig23/Qwen3-4B_Paper_Impact_code_SFT_1ep
FlyPig23/Qwen3-4B_Paper_Impact_code_SFT_1ep is a 4 billion parameter Qwen3-based instruction-tuned language model, fine-tuned from Qwen/Qwen3-4B-Instruct-2507. This model is specifically fine-tuned on the paper_impact_code_train dataset, indicating an optimization for tasks related to code generation or analysis within the context of research papers. It features a 32768-token context length and is designed for specialized code-related applications.
Loading preview...
Model Overview
FlyPig23/Qwen3-4B_Paper_Impact_code_SFT_1ep is a 4 billion parameter language model built upon the Qwen3 architecture, specifically fine-tuned from the Qwen/Qwen3-4B-Instruct-2507 base model. It is distinguished by its specialized training on the paper_impact_code_train dataset, suggesting a focus on tasks involving code within academic or research contexts.
Key Characteristics
- Base Model: Qwen3-4B-Instruct-2507, a 4 billion parameter instruction-tuned model.
- Specialized Fine-tuning: Trained for 1 epoch on the
paper_impact_code_traindataset, achieving a reported evaluation loss of 0.0773. - Context Length: Supports a substantial context window of 32768 tokens.
Training Details
The model was trained using the following hyperparameters:
- Learning Rate: 2e-05
- Batch Size: 8 (train), 8 (eval) with 2 gradient accumulation steps, resulting in a total train batch size of 64.
- Optimizer: AdamW with default betas and epsilon.
- Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio.
Potential Use Cases
Given its fine-tuning on a code-related dataset, this model is likely suitable for:
- Code Generation: Generating code snippets or functions based on specific requirements.
- Code Analysis: Assisting in understanding, summarizing, or refactoring code found in research papers.
- Research-focused Code Tasks: Applications requiring an understanding of code in an academic or scientific context.