zai-org/GLM-4.7-Flash
GLM-4.7-Flash is a 30 billion parameter Mixture-of-Experts (MoE) model developed by zai-org, designed for efficient and high-performance lightweight deployment. It demonstrates strong capabilities across various benchmarks, particularly excelling in agentic tasks, reasoning, and coding. This model offers a balanced solution for performance and efficiency in the 30B class.
Loading preview...
GLM-4.7-Flash: A Powerful 30B MoE Model
GLM-4.7-Flash, developed by zai-org, is a 30 billion parameter Mixture-of-Experts (MoE) model positioned as a leading option in its class for balancing performance and efficiency. It is designed for lightweight deployment while maintaining strong capabilities across a range of complex tasks.
Key Capabilities & Performance
GLM-4.7-Flash demonstrates competitive and often superior performance against other 30B-class models and even larger GPT-OSS-20B models across several benchmarks:
- AIME 25: Achieves 91.6, outperforming Qwen3-30B-A3B-Thinking-2507.
- GPQA: Scores 75.2, surpassing both Qwen3-30B-A3B-Thinking-2507 and GPT-OSS-20B.
- HLE: Records 14.4, significantly higher than competitors.
- SWE-bench Verified: Achieves 59.2, indicating strong coding and problem-solving abilities, far exceeding Qwen3-30B-A3B-Thinking-2507 (22.0) and GPT-OSS-20B (34.0).
- τ²-Bench: Scores 79.5, showcasing robust performance in multi-turn agentic tasks.
- BrowseComp: Achieves 42.8, demonstrating advanced browsing and comprehension skills.
Deployment and Usage
GLM-4.7-Flash supports local deployment using popular inference frameworks like vLLM and SGLang, with comprehensive instructions available in the official Github repository. It also integrates with the transformers library for ease of use. For multi-turn agentic tasks, users can leverage its "Preserved Thinking mode" for optimized performance.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.