SeongryongJung/Qwen3-4B-Chemistry-GRPO
SeongryongJung/Qwen3-4B-Chemistry-GRPO is a 4 billion parameter Qwen3-based language model fine-tuned by SeongryongJung using GRPO on a chemistry-specific dataset. This model is specialized for chemistry-related tasks, demonstrating a validation performance of 66.58% on the SciKnowEval chemistry split. It is designed to excel in applications requiring nuanced understanding and generation within the field of chemistry.
Loading preview...
Model Overview
SeongryongJung/Qwen3-4B-Chemistry-GRPO is a specialized 4 billion parameter language model, fine-tuned from the Qwen/Qwen3-4B base model. Its development focused on enhancing performance in chemistry-related tasks through the application of the GRPO (Generalized Reinforcement Learning from Policy Optimization) method on the chemistry split of a dataset.
Key Capabilities & Performance
This model is specifically optimized for chemistry applications. Its validation performance was measured using the val-aux/sciknoweval/reward/mean@16 metric, achieving a peak of 66.58% at step 100. This indicates its proficiency in handling complex chemistry-specific queries and tasks. The training process involved 100 steps, with performance steadily improving throughout.
Use Cases
- Chemistry-specific problem solving: Ideal for tasks requiring deep knowledge in chemistry.
- Research and development: Can assist in generating or analyzing chemical information.
- Educational tools: Potentially useful for creating chemistry-focused learning resources.
Technical Details
The model weights are the final global_step_100/actor checkpoint, converted from VERL FSDP shards to the Hugging Face format. The fine-tuning process was tracked via a W&B run (run-20260629_124519-qs487q2t).