Nalandadata/nalanda-qwen-7b-grpo
Nalandadata/nalanda-qwen-7b-grpo is a 7.6 billion parameter Qwen 2.5-based instruction-tuned language model developed by Nalandadata. It is specifically fine-tuned for solving Indian competitive exam questions (JEE/NEET) in Physics, Chemistry, Mathematics, and Biology. Utilizing Group Relative Policy Optimization (GRPO), this model achieves significant accuracy improvements on these specialized tasks while preserving general reasoning capabilities. It is optimized for accurate problem-solving and step-by-step reasoning in STEM subjects relevant to Indian curricula.
Loading preview...
Nalanda Qwen 2.5 7B GRPO Overview
Nalandadata/nalanda-qwen-7b-grpo is a 7.6 billion parameter model based on Qwen 2.5-7B-Instruct, specifically fine-tuned to excel at solving questions from Indian competitive exams like JEE Mains, JEE Advanced, and NEET UG. This model demonstrates strong performance across Physics, Chemistry, Mathematics, and Biology.
Key Capabilities & Training
This model was developed using a two-stage training pipeline:
- Stage 1: Light Supervised Fine-Tuning (SFT): A brief SFT phase (200 steps) using a mix of JEE/NEET questions and general instruction data (SlimOrca) to introduce domain-specific vocabulary and question formats without overwriting general knowledge.
- Stage 2: Group Relative Policy Optimization (GRPO): A more extensive training phase (600 steps) using 10,000 MCQs with verified answers. GRPO rewards the model for arriving at correct answers through its own reasoning, using multiple reward functions for correctness, format compliance, and reasoning quality. This method effectively preserves general reasoning abilities while enhancing specialized performance, unlike standard SFT which can lead to catastrophic forgetting.
Performance Highlights
The model shows substantial improvements over the Qwen 2.5 7B baseline on held-out JEE/NEET MCQs:
- Overall Accuracy: 69.6% (a +9.1 percentage point improvement).
- Subject-specific Accuracy: Physics (+14.0pp to 65.0%), Chemistry (+10.0pp to 71.5%), Mathematics (+8.5pp to 64.5%), and Biology (+4.0pp to 77.5%).
Crucially, public benchmark performance on tasks like GSM8K, ARC-Challenge, and MMLU-Physics/Chemistry is preserved or even slightly improved, indicating no degradation of general reasoning.
Ideal Use Cases
This model is particularly well-suited for applications requiring accurate and reasoned solutions to complex STEM problems, especially those formatted like Indian competitive exams. It can be used for educational tools, automated tutoring systems, or content generation related to these specific exam types.