Name: TMLR-Group-HF/Co-rewarding-II-Qwen3-8B-Base-DAPO14k API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: TMLR-Group-HF

Overview

Co-rewarding-II-Qwen3-8B-Base-DAPO14k is an 8 billion parameter large language model built upon the Qwen3-8B-Base architecture. Developed by Co-rewarding-II, this model distinguishes itself through its specialized training regimen, utilizing the DAPO-14k dataset. The integration of DAPO-14k suggests an optimization for tasks that benefit from data-augmented policy optimization techniques, aiming to enhance performance in specific areas.

Key Capabilities

Specialized Training: Leverages the DAPO-14k dataset for focused training, potentially leading to improved performance in areas related to data-augmented policy optimization.
Base Architecture: Built on the robust Qwen3-8B-Base model, providing a strong foundation for language understanding and generation.
Context Length: Supports a substantial context window of 32768 tokens, enabling the processing of longer inputs and maintaining coherence over extended interactions.

Good For

Research in Co-rewarding: Ideal for researchers and developers interested in exploring or applying co-rewarding mechanisms, as indicated by the model's origin and the associated GitHub repository [https://github.com/tmlr-group/Co-rewarding].
Applications requiring DAPO-14k specific knowledge: Suitable for use cases where the unique characteristics and data distribution of the DAPO-14k training set are advantageous.
General language tasks: While specialized, its Qwen3-8B-Base foundation allows for competent performance across a range of general natural language processing tasks.

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)