Siyuc/INFUSER-Qwen3-8B-base

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jun 5, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The Siyuc/INFUSER-Qwen3-8B-base is an 8 billion parameter language model, fine-tuned from Qwen/Qwen3-8B-Base using the INFUSER iterative co-training framework. This model is specifically designed to enhance reasoning capabilities by training on adaptively generated questions from unstructured documents. It demonstrates improved performance across general, mathematical, medical, and coding reasoning benchmarks compared to its base model and other reasoning-focused models. This model is optimized for complex reasoning tasks, particularly in academic and technical domains.

Loading preview...

INFUSER-Qwen3-8B-base: Enhanced Reasoning Model

This model, developed by Siyuc, is an 8 billion parameter language model based on Qwen/Qwen3-8B-Base. Its core differentiator is the application of the INFUSER (Influence-Guided Self-Evolution Improves Reasoning) iterative co-training framework. INFUSER uniquely co-evolves a Generator, which drafts questions from unstructured documents, and a Solver (this model), which improves by training on these adaptively generated questions. This method uses an optimizer-aware influence score to create a dynamic and effective learning curriculum.

Key Capabilities & Performance

The INFUSER-Qwen3-8B-base model shows notable improvements across various reasoning benchmarks:

  • General Reasoning: Achieves 41.66% on general reasoning benchmarks, outperforming its base model (34.43%) and other reasoning models like R-Zero and AZR.
  • Math & Physics Reasoning: Scores 33.30%, significantly higher than the base model's 26.08%.
  • Medical Reasoning: Demonstrates strong performance at 40.52%.
  • Coding: Achieves 53.66% on coding benchmarks, surpassing the base model's 50.59% and competitive with other specialized models.

Specific benchmark scores include 67.81% on MMLU-Pro, 84.25% on MATH500, and 78.86% on HumanEval+.

Good For

  • Complex Reasoning Tasks: Ideal for applications requiring advanced logical deduction and problem-solving.
  • Academic & Technical Domains: Excels in areas like mathematics, physics, and medical question answering.
  • Code Generation & Understanding: Strong performance in coding benchmarks suggests utility for development-related tasks.
  • Research & Development: Provides a robust foundation for further fine-tuning or research into self-improving LLMs.

For more details, refer to the INFUSER paper and the project page.