Name: SeongryongJung/Qwen3-4B-Chemistry-SDPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: SeongryongJung

Qwen3-4B Chemistry SDPO: Specialized for Scientific Generalization

This model, developed by SeongryongJung, is a 4 billion parameter variant of the Qwen3 architecture, specifically fine-tuned for chemistry tasks. It leverages Self-Distillation Policy Optimization (SDPO) and full-parameter FSDP Reinforcement Learning (RL) training to enhance its performance on scientific generalization problems.

Key Capabilities & Training Details

Chemistry Specialization: Fine-tuned on a dedicated sciknoweval/chemistry dataset, comprising 1,890 training examples and 210 validation examples.
Advanced RL Fine-tuning: Employs SDPO with a local SciKnowEval multiple-choice reward checker and token-level importance sampling for rollout correction.
Performance: Achieved a peak validation avg@16 score of 0.766369 at step 20 during training, demonstrating its proficiency in chemistry problem-solving.
Checkpoint Availability: Offers a 'Root final' checkpoint and a 'best_avg16' checkpoint, corresponding to the highest validation performance.
Context Length: Supports a maximum prompt length of 2048 tokens and a maximum response length of 8192 tokens, with a total model length of 10240 tokens.

Intended Use & Limitations

This model is primarily intended for research into RL fine-tuning and self-distillation behavior on science and generalization tasks. It is important to note that the reported scores are from a local experimental setup and should not be considered broad benchmark results without independent evaluation. The model has not undergone broad safety evaluations for production use.

Overview

Qwen3-4B Chemistry SDPO: Specialized for Scientific Generalization

Key Capabilities & Training Details

Intended Use & Limitations

Full Model Card (README)