Name: HerrHruby/MR_midtrain_9B_v3 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: HerrHruby

MR_midtrain_9B_v3: A Meta-Reasoning Powerhouse

MR_midtrain_9B_v3 is a 9 billion parameter instruction-tuned model, built upon the Qwen3.5-9B architecture, specifically engineered for advanced meta-reasoning. Developed by HerrHruby, this model integrates a unique three-stage meta-reasoning loop: MR (propose exploration directions), E (execute each direction and emit a summary), and FA (formulate the final answer). This process is facilitated by custom <direction> and <summary> special tokens.

Key Capabilities & Architecture

Meta-Reasoning Loop: Implements a sophisticated MR→E→FA loop for complex problem-solving, allowing the model to dynamically explore and synthesize information.
Base Model: Utilizes Qwen/Qwen3.5-9B as its foundation.
Architecture: Employs Qwen3_5ForConditionalGeneration, which is optimized for compatibility with both vLLM serving and verl Megatron (mbridge) RL environments, ensuring broad deployment flexibility.

Performance Highlights

MR_midtrain_9B_v3 demonstrates strong performance across challenging benchmarks, indicating its proficiency in reasoning tasks:

SODA2026: Achieves a mean score of 0.478.
IMO ProofBench: Records a pass@1 score of 0.547 and a best@3 score of 0.678 (evaluated by an official Gemini-3.1-Pro judge).
physics_papers: Attains a pass@1 score of 0.679.

These results indicate that v3 SFT matches or surpasses the best v2 RL checkpoints even before any v3 RL fine-tuning. Users should note that a temperature of 1.0 is recommended for optimal performance, as lower temperatures can lead to repetitive outputs in the 'E' step.

Overview

MR_midtrain_9B_v3: A Meta-Reasoning Powerhouse

Key Capabilities & Architecture

Performance Highlights

Full Model Card (README)