0G-AI/0GM-1.0-35B-A3B-0427
0GM-1.0-35B-A3B (Preview-0427) is a 35.1 billion parameter Mixture-of-Experts (MoE) model developed by 0G-AI, built upon the Qwen 3.6 35B-A3B architecture with approximately 3 billion active parameters per token. This model is fine-tuned for enhanced reasoning capabilities, demonstrating significant improvements across MMLU-Pro, AIME 2026, GSM-8K, and MATH-500 benchmarks. It features a native context length of 262,144 tokens, extensible up to 1,010,000 tokens, making it highly suitable for complex analytical and problem-solving tasks requiring deep contextual understanding.
Loading preview...
Overview
0GM-1.0-35B-A3B (Preview-0427) is a 35.1 billion parameter Mixture-of-Experts (MoE) model, fine-tuned by 0G-AI on Qwen 3.6 35B-A3B. It operates with roughly 3 billion active parameters per token and was trained on 0G Compute's decentralized network. This model is designed for advanced reasoning, featuring a native context length of 262,144 tokens, extensible up to 1,010,000 tokens.
Key Capabilities & Performance
- Superior Reasoning: Achieves leading scores across multiple benchmarks, including MMLU-Pro (77.62%), AIME 2026 (83.33%), GSM-8K (96.82%), and MATH-500 (95.80%).
- MMLU-Pro Gains: Outperforms its base model, Qwen 3.6 35B-A3B, in 12 out of 14 MMLU-Pro subjects, with notable improvements in physics (+5.1), philosophy (+4.8), engineering (+4.4), and computer science (+2.5).
- Efficient Token Usage: Demonstrates higher correctness with slightly shorter token chains on MATH-500 compared to the Qwen 3.6 dense baseline.
- Extended Context: Supports a default context length of 262,144 tokens, crucial for maintaining "thinking capabilities" in complex tasks.
- MoE Architecture: Utilizes a sophisticated MoE setup with 256 experts, 8 routed, and 1 shared, contributing to its performance.
Use Cases
- Complex Problem Solving: Ideal for tasks requiring deep reasoning and analytical skills, as evidenced by its benchmark performance.
- High-Context Applications: Suitable for scenarios demanding extensive contextual understanding due to its large context window.
- Research & Development: Provides a strong baseline for further fine-tuning or research into MoE models and reasoning tasks.