Model Overview
HeisenbergQ-0.5B-RL, developed by khazarai, is a specialized 0.5 billion parameter language model. It is a fine-tuned version of Qwen2.5-0.5B-Instruct, uniquely optimized for quantum physics reasoning. The model leverages GRPO (Grouped Relative Policy Optimization) with custom reward functions to enhance its performance in this domain.
Key Capabilities
- Quantum Physics Problem Solving: Designed to solve and reason through complex quantum physics problems.
- Structured Output: Produces answers in a specific XML format, including
<reasoning> and <answer> tags, facilitating clear, step-by-step logical thought processes. - Scientific Reasoning: Excels in general scientific reasoning within mathematics and physics contexts.
- Lightweight: Its 0.5B parameter size makes it a lightweight option for specialized tasks.
Training Details
The model was fine-tuned using GRPO with LoRA on the jilp00/YouToks-Instruct-Quantum-Physics-II dataset. Its training incorporated unique reward models:
- Reasoning Quality Reward: Encourages logical markers and coherent chains of thought.
- Token Count Reward: Prevents overly verbose or sparse explanations.
- XML Reward: Strictly enforces the
<reasoning> / <answer> output format. - Soft Format Reward: Ensures robust handling of edge cases in formatting.
Limitations
Due to its specialized training on approximately 1,000 samples, the model may exhibit hallucinations outside the physics domain. Its small parameter size, while making it lightweight, also means its reasoning depth is inherently limited compared to much larger general-purpose models.