stepfun-ai/Step-3.5-Flash

Warm
Public
199B
FP8
32768
Feb 1, 2026
License: apache-2.0
Hugging Face
Overview

Step 3.5 Flash: Frontier Reasoning and Agentic Capabilities

Step 3.5 Flash, developed by stepfun-ai, is a powerful open-source foundation model built on a sparse Mixture of Experts (MoE) architecture. While it boasts 196.81 billion total parameters, it efficiently activates only ~11 billion parameters per token, allowing it to deliver deep reasoning comparable to top-tier proprietary models with high agility.

Key Capabilities

  • Deep Reasoning at Speed: Utilizes 3-way Multi-Token Prediction (MTP-3) to achieve generation throughputs of 100–300 tok/s (peaking at 350 tok/s for coding), enabling complex, multi-step reasoning chains with immediate responsiveness.
  • Robust Engine for Coding & Agents: Purpose-built for agentic tasks with a scalable RL framework, achieving 74.4% on SWE-bench Verified and 51.0% on Terminal-Bench 2.0.
  • Efficient Long Context: Supports a cost-efficient 256K context window through a hybrid 3:1 Sliding Window Attention (SWA) ratio, reducing computational overhead.
  • Accessible Local Deployment: Optimized for secure local inference on high-end consumer hardware, ensuring data privacy without sacrificing performance.

Good for

  • Agentic AI applications requiring fast, complex, multi-step reasoning.
  • Coding tasks and long-horizon problem-solving in development environments.
  • Real-time interactive systems where high generation throughput is critical.
  • Local deployments on consumer-grade hardware for privacy-sensitive or offline use cases.