Name: TMLR-Group-HF/GT-Qwen3-8B-Base-MATH API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: TMLR-Group-HF

Model Overview

The TMLR-Group-HF/GT-Qwen3-8B-Base-MATH is an 8 billion parameter model based on the Qwen3-Base architecture. It was developed by TMLR-Group-HF using the GRPO Ground Truth method, specifically leveraging a MATH training set. This training approach is detailed in the research paper "Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models" (arXiv:2508.00410). The model's training methodology, known as Co-rewarding, focuses on stable self-supervised reinforcement learning to enhance reasoning capabilities in large language models.

Key Capabilities

Enhanced Reasoning: Optimized for complex reasoning tasks, particularly in mathematical domains.
Specialized Training: Utilizes the GRPO Ground Truth method with a dedicated MATH dataset.
Self-supervised RL: Incorporates Co-rewarding for stable self-supervised reinforcement learning.
Context Length: Supports a substantial context window of 32768 tokens.

Good For

Mathematical Problem Solving: Excels in tasks requiring logical and mathematical reasoning.
Research in Reasoning: Ideal for researchers exploring advanced reasoning elicitation techniques in LLMs.
Complex Task Handling: Suitable for applications that benefit from a large context window and robust reasoning.

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)