Name: pixas/Miner-4B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: pixas

Overview

Miner-4B is a 4 billion parameter reasoning model developed by pixas, utilizing the MINER (Mining Intrinsic Mastery for Data-Efficient RL in Large Reasoning Models) framework. This model addresses the data inefficiency of critic-free reinforcement learning methods, especially when all sampled rollouts are correct and provide limited learning signals. MINER introduces two core concepts: token-level focal credit assignment to amplify learning on uncertain tokens and adaptive advantage calibration for stable integration of intrinsic and verifiable rewards.

Key Capabilities

Enhanced Reasoning: Improves performance on complex reasoning and problem-solving tasks.
Data-Efficient RL: Leverages intrinsic uncertainty for self-supervised reward signals, reducing reliance on auxiliary reward models.
Robust Training: Evaluated on six reasoning benchmarks, demonstrating stronger sample efficiency and accuracy compared to baseline GRPO variants.

Intended Use Cases

Research and Experimentation: Ideal for academic research on RL for reasoning models.
Mathematical and Verifiable Reasoning: Suitable for tasks requiring precise logical deduction.
Model Evaluation: Useful for evaluating reasoning benchmarks and conducting ablation studies based on the MINER framework.
Further Finetuning: Serves as a checkpoint for additional finetuning or post-training efforts.

Overview

Overview

Key Capabilities

Intended Use Cases

Full Model Card (README)