ACE-Brain/ACE-Brain-0-8B

VISIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Feb 26, 2026License:mitArchitecture:Transformer0.0K Open Weights Cold

ACE-Brain-0-8B is an 8 billion parameter multimodal foundation model developed by ACE-Brain, built upon a unified MLLM architecture with a 32K context length. It is specifically designed for embodied intelligence, unifying perception, reasoning, and decision-making across diverse domains like spatial cognition, autonomous driving, low-altitude sensing, and embodied interaction. The model learns a shared spatial reasoning substrate, enabling strong generalization across heterogeneous physical environments and agent embodiments.

Loading preview...

Overview

ACE-Brain-0-8B is an 8 billion parameter generalist multimodal foundation model (MLLM) developed by ACE-Brain, designed for embodied intelligence. It unifies perception, reasoning, and decision-making across diverse physical domains by learning a shared spatial reasoning substrate.

Key Capabilities

  • Unified Multimodal Architecture: Integrates perception, reasoning, and decision-making for embodied tasks.
  • Strong Spatial Reasoning: Utilizes spatial intelligence as a core scaffold for universal generalization.
  • Diverse Embodiment Support: Applicable to spatial cognition, autonomous driving, low-altitude sensing, and embodied interaction.
  • Cross-Domain Generalization: Excels in perception, reasoning, and planning across various complex environments.

Performance Highlights

ACE-Brain-0-8B has been extensively evaluated across 24 benchmarks covering its specialized domains, consistently achieving state-of-the-art or competitive performance against existing open-source and some closed-source models. It demonstrates robust capabilities in environment understanding, motion reasoning, planning-aware prediction, and physical interaction understanding, particularly in complex and safety-critical scenarios like driving and aerial domains. The model's spatial intelligence-based training enhances overall visual-language intelligence without limiting generalization.

Good For

  • Developing applications requiring advanced spatial cognition.
  • Enhancing autonomous driving systems with improved environment understanding and planning.
  • Applications in low-altitude sensing for drones and aerial vehicles.
  • Creating intelligent agents for embodied interaction in virtual or physical environments.