Name: TMLR-Group-HF/GT-Qwen3-8B-Base-DAPO14k API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: TMLR-Group-HF

Model Overview

TMLR-Group-HF/GT-Qwen3-8B-Base-DAPO14k is an 8 billion parameter Qwen3-Base model, developed by TMLR-Group-HF, that has been fine-tuned using the DAPO-14k dataset. This model is a direct result of research into Co-rewarding, a novel self-supervised reinforcement learning (RL) framework designed to improve the reasoning capabilities of large language models (LLMs). The core innovation of Co-rewarding is its ability to address training instability common in self-rewarding methods by utilizing complementary supervision from multiple perspectives.

Key Capabilities

Enhanced Reasoning: Specifically engineered to boost the reasoning abilities of LLMs, particularly in complex problem-solving scenarios.
Stable Self-supervised Learning: Employs the Co-rewarding framework, which includes data-side (Co-rewarding-I) and model-side (Co-rewarding-II) instantiations, to ensure more stable training compared to traditional self-rewarding approaches.
Mathematical Reasoning: Demonstrates significant performance improvements on various mathematical reasoning benchmarks, often outperforming RLVR methods that rely on ground-truth labels.
Large Context Window: Supports a substantial context length of 32768 tokens, allowing for processing and understanding longer inputs.

Good For

Research in RL and LLMs: Ideal for researchers exploring advanced self-supervised learning techniques and reinforcement learning applications in language models.
Mathematical Problem Solving: Suited for applications requiring robust mathematical reasoning and logical deduction.
Benchmarking: Can be used as a strong baseline for evaluating new reasoning-focused LLM techniques.

For more technical details and the underlying research, refer to the paper Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models and the official GitHub repository.

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)