Name: empero-ai/openNemo-Cascade-2-30B-A3B API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: empero-ai

openNemo-Cascade-2-30B-A3B: Pure PyTorch MoE for Advanced Reasoning

Empero AI's openNemo-Cascade-2-30B-A3B is a 30.87 billion parameter Mixture-of-Experts (MoE) model, with approximately 3 billion active parameters per token. It is a direct, pure-PyTorch replacement for NVIDIA's Nemotron-Cascade-2-30B-A3B, designed to eliminate dependencies on external CUDA kernels like mamba-ssm and causal-conv1d.

Key Differentiators & Capabilities

Enhanced Quantization & Fine-tuning: By replacing CUDA kernels with native PyTorch operations, this model enables full compatibility with bitsandbytes 4-bit quantization and QLoRA fine-tuning on consumer GPUs, loading in approximately 17 GB VRAM when quantized.
Preserved Performance: It retains the original Nemotron-Cascade-2's architecture and weights, ensuring identical performance. The original model achieved gold medal status on challenging reasoning benchmarks such as IMO 2025 (35 pts) and IOI 2025 (439.3 pts).
Flexible Architecture: The model is a 52-layer hybrid, combining Mamba2 SSM blocks, Mixture-of-Experts blocks (128 routed experts, top-6 selected), and Grouped Query Attention blocks.
Simplified Deployment: No mamba-ssm or causal-conv1d installation is required, simplifying setup and avoiding common CUDA version conflicts.
Memory Optimization: Includes an automatic fix for async weight loading to prevent Out-of-Memory (OOM) errors during 4-bit quantization on GPUs with less VRAM.

Ideal Use Cases

Advanced Reasoning & Problem Solving: Excels in complex mathematical and logical reasoning tasks, as demonstrated by its benchmark performance.
Resource-Constrained Environments: Suitable for deployment and fine-tuning on consumer-grade GPUs due to its 4-bit quantization compatibility and reduced VRAM footprint.
Research & Development: Provides a flexible, pure-PyTorch base for experimenting with MoE models, quantization, and QLoRA fine-tuning without kernel-related hurdles.

Overview

openNemo-Cascade-2-30B-A3B: Pure PyTorch MoE for Advanced Reasoning

Key Differentiators & Capabilities

Ideal Use Cases

Full Model Card (README)