pixas/Miner-8B

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 9, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

pixas/Miner-8B is an 8 billion parameter reasoning model developed by pixas, trained with the MINER reinforcement learning method. This method enhances data efficiency for large reasoning models by leveraging intrinsic uncertainty as a self-supervised reward signal. It is specifically designed to improve performance on reasoning and problem-solving tasks, particularly in scenarios where standard RL methods are inefficient due to homogeneous positive prompts. The model incorporates token-level focal credit assignment and adaptive advantage calibration to achieve stronger sample efficiency and accuracy on various reasoning benchmarks.

Loading preview...

Overview

pixas/Miner-8B is an 8 billion parameter reasoning model developed by pixas, specifically trained using the MINER (Mining Intrinsic Mastery for Data-Efficient RL in Large Reasoning Models) framework. MINER is a reinforcement learning method designed to address the data inefficiency of critic-free RL on prompts where all sampled rollouts are correct, providing little learning signal. It achieves this by utilizing the policy's intrinsic uncertainty as a self-supervised reward, eliminating the need for auxiliary reward models or additional inference-time overhead.

Key Innovations

The MINER framework introduces two core concepts:

  • Token-level focal credit assignment: This mechanism amplifies learning for uncertain and critical tokens while suppressing overconfident ones.
  • Adaptive advantage calibration: This integrates intrinsic and verifiable rewards in a stable manner.

Performance & Evaluation

Evaluated on six reasoning benchmarks, MINER demonstrates stronger sample efficiency and accuracy compared to several baseline methods, including GRPO variants. The model is a research checkpoint and its performance may vary depending on the base model, data mixture, and evaluation pipeline used.

Intended Use Cases

This model is primarily intended for research and experimental use in:

  • Reasoning and problem-solving tasks.
  • Reinforcement learning for language models.
  • Mathematical and verifiable reasoning.
  • Post-training and evaluation of large reasoning models.

Potential applications include academic research, evaluation on reasoning benchmarks, and further finetuning based on the MINER framework. Users should be aware of potential limitations, such as producing incorrect or incomplete reasoning outputs, and the model's performance being sensitive to prompt format and decoding setup.