Nalandadata/nalanda-qwen-7b-grpo
Nalanda Qwen 2.5 7B GRPO is a fine-tuned Qwen2.5-7B-Instruct model developed by Nalandadata, specialized for solving Indian competitive exam questions (JEE/NEET) across Physics, Chemistry, Mathematics, and Biology. This model utilizes a two-stage training pipeline, including Group Relative Policy Optimization (GRPO), to achieve significant accuracy improvements on domain-specific MCQs while preserving general reasoning capabilities. It demonstrates an overall accuracy of 69.6% on JEE/NEET exams, making it suitable for educational technology applications focused on STEM competitive exam preparation.
Loading preview...
Nalanda Qwen 2.5 7B GRPO: Specialized for Indian Competitive Exams
Nalanda Qwen 2.5 7B GRPO is a specialized large language model developed by Nalandadata, fine-tuned from the Qwen2.5-7B-Instruct base model. Its primary purpose is to excel at solving multiple-choice questions (MCQs) from Indian competitive exams such as JEE Mains, JEE Advanced, and NEET UG, covering Physics, Chemistry, Mathematics, and Biology.
Key Capabilities & Training Methodology
This model employs a unique two-stage training pipeline:
- Stage 1: Light Supervised Fine-Tuning (SFT): Briefly introduced domain vocabulary and question formats using a mix of JEE/NEET questions and general instruction data.
- Stage 2: Group Relative Policy Optimization (GRPO): This crucial stage, inspired by advanced research, trained the model to arrive at correct answers through its own reasoning. Unlike standard SFT which can lead to catastrophic forgetting, GRPO rewards correctness, format compliance, and reasoning quality, preserving and enhancing general capabilities.
Performance Highlights
Nalanda Qwen 2.5 7B GRPO shows substantial improvements over the baseline Qwen 2.5 7B model on held-out JEE/NEET MCQs:
- Overall Accuracy: Achieves 69.6%, a +9.1 percentage point improvement.
- Subject-specific Accuracy: Physics (+14.0pp to 65.0%), Chemistry (+10.0pp to 71.5%), Mathematics (+8.5pp to 64.5%), and Biology (+4.0pp to 77.5%).
- Public Benchmark Preservation: Crucially, the model maintains or slightly improves performance on general reasoning benchmarks like GSM8K, ARC-Challenge, and MMLU-Physics/Chemistry, indicating no catastrophic forgetting.
Ideal Use Cases
This model is particularly well-suited for:
- EdTech platforms: Generating solutions or explanations for JEE/NEET-style questions.
- Automated tutoring systems: Providing step-by-step reasoning for STEM competitive exam problems.
- Content creation: Assisting in the development of educational materials for Indian competitive exams.