zai-org/GLM-4.7-Flash

Hugging Face
TEXT GENERATIONConcurrency Cost:2Model Size:30BQuant:FP8Ctx Length:32kPublished:Jan 19, 2026License:mitArchitecture:Transformer1.7K Open Weights Warm

GLM-4.7-Flash is a 30 billion parameter Mixture-of-Experts (MoE) model developed by zai-org, designed for efficient and high-performance lightweight deployment. It demonstrates strong capabilities across various benchmarks, particularly excelling in agentic tasks, reasoning, and coding. This model offers a balanced solution for performance and efficiency in the 30B class.

Loading preview...

GLM-4.7-Flash: A Powerful 30B MoE Model

GLM-4.7-Flash, developed by zai-org, is a 30 billion parameter Mixture-of-Experts (MoE) model positioned as a leading option in its class for balancing performance and efficiency. It is designed for lightweight deployment while maintaining strong capabilities across a range of complex tasks.

Key Capabilities & Performance

GLM-4.7-Flash demonstrates competitive and often superior performance against other 30B-class models and even larger GPT-OSS-20B models across several benchmarks:

  • AIME 25: Achieves 91.6, outperforming Qwen3-30B-A3B-Thinking-2507.
  • GPQA: Scores 75.2, surpassing both Qwen3-30B-A3B-Thinking-2507 and GPT-OSS-20B.
  • HLE: Records 14.4, significantly higher than competitors.
  • SWE-bench Verified: Achieves 59.2, indicating strong coding and problem-solving abilities, far exceeding Qwen3-30B-A3B-Thinking-2507 (22.0) and GPT-OSS-20B (34.0).
  • τ²-Bench: Scores 79.5, showcasing robust performance in multi-turn agentic tasks.
  • BrowseComp: Achieves 42.8, demonstrating advanced browsing and comprehension skills.

Deployment and Usage

GLM-4.7-Flash supports local deployment using popular inference frameworks like vLLM and SGLang, with comprehensive instructions available in the official Github repository. It also integrates with the transformers library for ease of use. For multi-turn agentic tasks, users can leverage its "Preserved Thinking mode" for optimized performance.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p