SeongryongJung/Qwen3-4B-Chemistry-GRPO

TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Jun 28, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

SeongryongJung/Qwen3-4B-Chemistry-GRPO is a 4 billion parameter Qwen3-based language model fine-tuned by SeongryongJung using GRPO on a chemistry-specific dataset. This model is specialized for chemistry-related tasks, demonstrating a validation performance of 66.58% on the SciKnowEval chemistry split. It is designed to excel in applications requiring nuanced understanding and generation within the field of chemistry.

Loading preview...

Model Overview

SeongryongJung/Qwen3-4B-Chemistry-GRPO is a specialized 4 billion parameter language model, fine-tuned from the Qwen/Qwen3-4B base model. Its development focused on enhancing performance in chemistry-related tasks through the application of the GRPO (Generalized Reinforcement Learning from Policy Optimization) method on the chemistry split of a dataset.

Key Capabilities & Performance

This model is specifically optimized for chemistry applications. Its validation performance was measured using the val-aux/sciknoweval/reward/mean@16 metric, achieving a peak of 66.58% at step 100. This indicates its proficiency in handling complex chemistry-specific queries and tasks. The training process involved 100 steps, with performance steadily improving throughout.

Use Cases

  • Chemistry-specific problem solving: Ideal for tasks requiring deep knowledge in chemistry.
  • Research and development: Can assist in generating or analyzing chemical information.
  • Educational tools: Potentially useful for creating chemistry-focused learning resources.

Technical Details

The model weights are the final global_step_100/actor checkpoint, converted from VERL FSDP shards to the Hugging Face format. The fine-tuning process was tracked via a W&B run (run-20260629_124519-qs487q2t).