zai-org/GLM-4.7-Flash

Warm
Public
30B
FP8
32768
Jan 19, 2026
License: mit
Hugging Face
Overview

GLM-4.7-Flash: A Powerful 30B MoE Model

GLM-4.7-Flash, developed by zai-org, is a 30 billion parameter Mixture-of-Experts (MoE) model positioned as a leading option in its class for balancing performance and efficiency. It is designed for lightweight deployment while maintaining strong capabilities across a range of complex tasks.

Key Capabilities & Performance

GLM-4.7-Flash demonstrates competitive and often superior performance against other 30B-class models and even larger GPT-OSS-20B models across several benchmarks:

  • AIME 25: Achieves 91.6, outperforming Qwen3-30B-A3B-Thinking-2507.
  • GPQA: Scores 75.2, surpassing both Qwen3-30B-A3B-Thinking-2507 and GPT-OSS-20B.
  • HLE: Records 14.4, significantly higher than competitors.
  • SWE-bench Verified: Achieves 59.2, indicating strong coding and problem-solving abilities, far exceeding Qwen3-30B-A3B-Thinking-2507 (22.0) and GPT-OSS-20B (34.0).
  • τ²-Bench: Scores 79.5, showcasing robust performance in multi-turn agentic tasks.
  • BrowseComp: Achieves 42.8, demonstrating advanced browsing and comprehension skills.

Deployment and Usage

GLM-4.7-Flash supports local deployment using popular inference frameworks like vLLM and SGLang, with comprehensive instructions available in the official Github repository. It also integrates with the transformers library for ease of use. For multi-turn agentic tasks, users can leverage its "Preserved Thinking mode" for optimized performance.