Name: TMLR-Group-HF/Co-rewarding-III-Qwen3-8B-Base-DAPO14k API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: TMLR-Group-HF

Overview

This model, Co-rewarding-III-Qwen3-8B-Base-DAPO14k, is an 8 billion parameter Qwen3-Base model developed by TMLR-Group. Its core innovation lies in its fine-tuning approach, utilizing the Co-rewarding-III method on the DAPO14k training set. This methodology is detailed in the paper "Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models" (arXiv:2508.00410).

Key Capabilities

Enhanced Reasoning: Specifically trained to improve and elicit reasoning abilities in large language models through a stable self-supervised reinforcement learning framework.
Co-rewarding Framework: Leverages the Co-rewarding-III method, a novel approach for fine-tuning, to achieve its reasoning capabilities.
Qwen3-Base Architecture: Built upon the robust Qwen3-8B-Base model, providing a strong foundation for its specialized fine-tuning.

Good For

Research in Self-supervised RL: Ideal for researchers exploring stable self-supervised reinforcement learning techniques for LLMs.
Reasoning-intensive Tasks: Suitable for applications requiring advanced logical inference and problem-solving from language models.
Benchmarking Reasoning: Can be used as a baseline or comparison model for evaluating reasoning performance in LLMs.

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)