Name: TMLR-Group-HF/Majority-Voting-Qwen3-8B-Base-DAPO14k API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: TMLR-Group-HF

Model Overview

This model, Majority-Voting: Qwen3-8B-Base-DAPO14k, is an 8 billion parameter language model built upon the Qwen3 architecture. It has been specifically trained using the DAPO-14k dataset as part of research into advanced reasoning capabilities in large language models.

Key Capabilities

Enhanced Reasoning: The model's training incorporates a majority-voting mechanism, a technique explored in the paper "Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models" (arXiv:2508.00410). This method aims to improve the model's ability to perform complex reasoning tasks.
Self-supervised RL: It leverages stable self-supervised reinforcement learning, a novel approach to training that allows the model to learn and refine its reasoning processes without extensive human supervision.

Good For

Research in LLM Reasoning: Ideal for researchers and developers interested in exploring and applying advanced reasoning techniques in large language models.
Applications Requiring Logical Inference: Suitable for use cases where robust logical inference and problem-solving are critical, benefiting from its specialized training methodology.
Experimentation with Co-rewarding: Provides a practical implementation for those looking to experiment with the 'Co-rewarding' framework for eliciting reasoning.

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)