shreethar/stage1_unsloth
shreethar/stage1_unsloth is a 4.5 billion parameter, natively multimodal Vision-Language-Action (VLA) model developed by Shreethar at Universiti Teknikal Malaysia Melaka (UTeM). Based on Qwen3.5-4B, it has been fine-tuned via supervised instruction tuning on eight robot-domain datasets to establish foundational robotic knowledge. This model serves as Stage 1 of the ReasonFlow VLA pipeline, specializing in robot grounding for tasks like trajectory prediction, affordance grounding, and task planning.
Loading preview...
ReasonFlow VLA — Stage 1: Robot Grounding SFT
This model, shreethar/stage1_unsloth, is the initial checkpoint for ReasonFlow VLA, a multi-stage Vision-Language-Action system developed by Shreethar at UTeM. It is a Qwen3.5-4B (natively multimodal) model, fine-tuned using Supervised Fine-Tuning (SFT) via Unsloth.
Key Capabilities & Training
- Natively Multimodal: Processes both vision and language inputs, with an image resolution of 448 × 448.
- Robot-Domain Grounding: Fine-tuned on approximately 560,000 samples across eight specialized robot-domain datasets.
- Diverse Robotic Tasks: Training data covers:
- 2D end-effector trajectory prediction (MolmoAct Trajectory)
- Robot visual question answering (RoboVQA, Pixmo Cap-QA, Pixmo AMA)
- Failure analysis and correction QA (RoboFAC)
- Affordance bounding box prediction (ShareRobot Affordance)
- Multi-step task planning QA (ShareRobot Planning)
- Dense image captioning (Pixmo Cap)
- Instruction-Tuned: All samples follow a two-turn chat format, enabling the model to output normalized waypoint lists for trajectory tasks and free-form text for QA tasks.
- Foundational Knowledge: Establishes core robotic understanding before further stages involving RL or distillation.
Project Context
This model represents Stage 1 of the ReasonFlow VLA pipeline, focusing on robot grounding. It is designed to be the shared initialization point for both Teacher and Student models in the subsequent Stage 2 (GRPO Teacher-Student Distillation), which is currently in progress. The full project repository is available on GitHub.