shanyangmie/qwen3-vl-8b-thinking-physics-r2-sft-v1
The shanyangmie/qwen3-vl-8b-thinking-physics-r2-sft-v1 is an 8 billion parameter Qwen3-VL model fine-tuned for physics problem-solving. Developed by shanyangmie, this model specializes in generating structured reasoning trajectories that integrate a SymPy tool for symbolic expression evaluation. It is designed to process physics problems and produce detailed, tool-augmented solutions, making it suitable for advanced scientific reasoning tasks. The model has a context length of 32768 tokens and is a cold-start checkpoint for the Physics-R2 project.
Loading preview...
Model Overview
shanyangmie/qwen3-vl-8b-thinking-physics-r2-sft-v1 is an 8 billion parameter Qwen3-VL model, serving as a cold-start checkpoint for the Physics-R2 project. This model is specifically fine-tuned to solve physics problems by generating structured reasoning trajectories that incorporate a SymPy tool for symbolic computation. It was trained on 1,776 audited tool-using physics trajectories from the shanyangmie/physr1corp-cold-start dataset.
Key Capabilities
- Tool-Augmented Reasoning: Emits structured reasoning trajectories that dynamically use a SymPy tool. A runtime harness executes SymPy expressions and injects results back into the model's thought process.
- Physics Problem Solving: Specialized in tackling physics problems, producing detailed, step-by-step solutions.
- Qwen3-VL Base: Built upon the Qwen3-VL-8B-Thinking architecture, providing a strong foundation for multimodal understanding, though the visual tower was frozen during this specific SFT v1.
Training Details
- Fine-tuned from
Qwen/Qwen3-VL-8B-Thinkingusing TRLSFTTraineron FSDP1. - Trained for 3 epochs with a sequence length of 4096 and bfloat16 mixed precision.
- Achieved an eval token accuracy of 84.7%.
Limitations & Known Issues
- Text-Only SFT: The visual tower was frozen during this v1 SFT, meaning the model does not currently utilize image content for multimodal physics problems.
- Mild Overfitting: Eval loss showed a slight increase in the final epoch, suggesting mild overfitting.
- Hallucination Risk: Requires a proper inference harness to prevent the model from hallucinating tool results, as it was trained to emit the full
<tool>...</tool_result>pattern.