mjf-su/PhysicalAI-base-VLA
PhysicalAI-reason-VLA by mjf-su is a 4 billion parameter vision-language driving policy model, fine-tuned from PhysicalAI-base-VLA (based on Qwen3-VL-4B-Thinking) with a 32768 token context length. It uniquely integrates structured chain-of-thought reasoning and discrete driving decisions, trained on 10,000 Gemini-annotated driving scenes. This model excels at generating explicit driving justifications, specific longitudinal and lateral actions, and future trajectory waypoints from camera images and past vehicle motion.
Loading preview...
PhysicalAI-reason-VLA: Vision-Language Driving Policy with Reasoning
PhysicalAI-reason-VLA is a 4 billion parameter vision-language model developed by mjf-su, specifically fine-tuned for autonomous driving policy. Building upon the PhysicalAI-base-VLA (which itself is based on Qwen3-VL-4B-Thinking), this model introduces a novel approach by incorporating structured chain-of-thought reasoning and discrete driving decisions.
Key Capabilities
- Integrated Reasoning and Action: Generates a detailed reasoning trace (
<think>), explicit longitudinal and lateral driving decisions (<action>), and future trajectory waypoints (<wp>) in sequence. - Discrete Decision Tokens: Utilizes genuine single tokens for a comprehensive set of driving actions (e.g.,
<stop>,<turn_left>,<lane_keep>), allowing efficient probability measurement over the full decision space. - Contextual Understanding: Processes forward-facing camera images and past ego-vehicle waypoints to inform its driving policy.
Training Details
This model was fine-tuned using supervised fine-tuning (SFT) via TRL on 10,000 Gemini-annotated driving scenes from the mjf-su/PhysicalAI-reason-US dataset. The training focused on generating chain-of-thought labels on real US driving data over 2 epochs.
Good for
- Developing autonomous driving systems that require explicit, interpretable decision-making.
- Research into explainable AI for robotic control and vision-language navigation.
- Applications needing a model that can not only predict trajectories but also justify its actions with structured reasoning.