shanyangmie/qwen3-vl-8b-thinking-physics-r2-sft-v1

VISIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:May 29, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The shanyangmie/qwen3-vl-8b-thinking-physics-r2-sft-v1 is an 8 billion parameter Qwen3-VL model fine-tuned for physics problem-solving. Developed by shanyangmie, this model specializes in generating structured reasoning trajectories that integrate a SymPy tool for symbolic expression evaluation. It is designed to process physics problems and produce detailed, tool-augmented solutions, making it suitable for advanced scientific reasoning tasks. The model has a context length of 32768 tokens and is a cold-start checkpoint for the Physics-R2 project.

Loading preview...

Model Overview

shanyangmie/qwen3-vl-8b-thinking-physics-r2-sft-v1 is an 8 billion parameter Qwen3-VL model, serving as a cold-start checkpoint for the Physics-R2 project. This model is specifically fine-tuned to solve physics problems by generating structured reasoning trajectories that incorporate a SymPy tool for symbolic computation. It was trained on 1,776 audited tool-using physics trajectories from the shanyangmie/physr1corp-cold-start dataset.

Key Capabilities

  • Tool-Augmented Reasoning: Emits structured reasoning trajectories that dynamically use a SymPy tool. A runtime harness executes SymPy expressions and injects results back into the model's thought process.
  • Physics Problem Solving: Specialized in tackling physics problems, producing detailed, step-by-step solutions.
  • Qwen3-VL Base: Built upon the Qwen3-VL-8B-Thinking architecture, providing a strong foundation for multimodal understanding, though the visual tower was frozen during this specific SFT v1.

Training Details

  • Fine-tuned from Qwen/Qwen3-VL-8B-Thinking using TRL SFTTrainer on FSDP1.
  • Trained for 3 epochs with a sequence length of 4096 and bfloat16 mixed precision.
  • Achieved an eval token accuracy of 84.7%.

Limitations & Known Issues

  • Text-Only SFT: The visual tower was frozen during this v1 SFT, meaning the model does not currently utilize image content for multimodal physics problems.
  • Mild Overfitting: Eval loss showed a slight increase in the final epoch, suggesting mild overfitting.
  • Hallucination Risk: Requires a proper inference harness to prevent the model from hallucinating tool results, as it was trained to emit the full <tool>...</tool_result> pattern.