SeongryongJung/Qwen3-4B-Chemistry-SDPO
SeongryongJung/Qwen3-4B-Chemistry-SDPO is a 4 billion parameter Qwen3-based causal language model fine-tuned specifically for chemistry-related tasks. Developed by SeongryongJung, this model utilizes SDPO (Self-Distillation Policy Optimization) and FSDP RL training on a SciKnowEval-style chemistry dataset. It is optimized for generalization in scientific domains, achieving a peak validation average score of 0.766369 on chemistry problems.
Loading preview...
Qwen3-4B Chemistry SDPO: Specialized for Scientific Generalization
This model, developed by SeongryongJung, is a 4 billion parameter variant of the Qwen3 architecture, specifically fine-tuned for chemistry tasks. It leverages Self-Distillation Policy Optimization (SDPO) and full-parameter FSDP Reinforcement Learning (RL) training to enhance its performance on scientific generalization problems.
Key Capabilities & Training Details
- Chemistry Specialization: Fine-tuned on a dedicated
sciknoweval/chemistrydataset, comprising 1,890 training examples and 210 validation examples. - Advanced RL Fine-tuning: Employs SDPO with a local SciKnowEval multiple-choice reward checker and token-level importance sampling for rollout correction.
- Performance: Achieved a peak validation
avg@16score of 0.766369 at step 20 during training, demonstrating its proficiency in chemistry problem-solving. - Checkpoint Availability: Offers a 'Root final' checkpoint and a 'best_avg16' checkpoint, corresponding to the highest validation performance.
- Context Length: Supports a maximum prompt length of 2048 tokens and a maximum response length of 8192 tokens, with a total model length of 10240 tokens.
Intended Use & Limitations
This model is primarily intended for research into RL fine-tuning and self-distillation behavior on science and generalization tasks. It is important to note that the reported scores are from a local experimental setup and should not be considered broad benchmark results without independent evaluation. The model has not undergone broad safety evaluations for production use.