Name: mjf-su/PhysicalAI-base-VLA API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: mjf-su

PhysicalAI-reason-VLA: Vision-Language Driving Policy with Reasoning

PhysicalAI-reason-VLA is a 4 billion parameter vision-language model developed by mjf-su, specifically fine-tuned for autonomous driving policy. Building upon the PhysicalAI-base-VLA (which itself is based on Qwen3-VL-4B-Thinking), this model introduces a novel approach by incorporating structured chain-of-thought reasoning and discrete driving decisions.

Key Capabilities

Integrated Reasoning and Action: Generates a detailed reasoning trace (<think>), explicit longitudinal and lateral driving decisions (<action>), and future trajectory waypoints (<wp>) in sequence.
Discrete Decision Tokens: Utilizes genuine single tokens for a comprehensive set of driving actions (e.g., <stop>, <turn_left>, <lane_keep>), allowing efficient probability measurement over the full decision space.
Contextual Understanding: Processes forward-facing camera images and past ego-vehicle waypoints to inform its driving policy.

Training Details

This model was fine-tuned using supervised fine-tuning (SFT) via TRL on 10,000 Gemini-annotated driving scenes from the mjf-su/PhysicalAI-reason-US dataset. The training focused on generating chain-of-thought labels on real US driving data over 2 epochs.

Good for

Developing autonomous driving systems that require explicit, interpretable decision-making.
Research into explainable AI for robotic control and vision-language navigation.
Applications needing a model that can not only predict trajectories but also justify its actions with structured reasoning.

Overview

PhysicalAI-reason-VLA: Vision-Language Driving Policy with Reasoning

Key Capabilities

Training Details

Good for

Full Model Card (README)