TMLR-Group-HF/Co-rewarding-II-Qwen3-8B-Base-DAPO14k

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Oct 3, 2025License:mitArchitecture:Transformer0.0K Open Weights Cold

Co-rewarding-II-Qwen3-8B-Base-DAPO14k is an 8 billion parameter language model developed by Co-rewarding-II, based on the Qwen3-8B-Base architecture. It has been specifically trained using the DAPO-14k dataset, indicating a focus on data-augmented policy optimization. This model is designed for tasks benefiting from its specialized training on the DAPO-14k dataset, offering a context length of 32768 tokens.

Loading preview...

Overview

Co-rewarding-II-Qwen3-8B-Base-DAPO14k is an 8 billion parameter large language model built upon the Qwen3-8B-Base architecture. Developed by Co-rewarding-II, this model distinguishes itself through its specialized training regimen, utilizing the DAPO-14k dataset. The integration of DAPO-14k suggests an optimization for tasks that benefit from data-augmented policy optimization techniques, aiming to enhance performance in specific areas.

Key Capabilities

  • Specialized Training: Leverages the DAPO-14k dataset for focused training, potentially leading to improved performance in areas related to data-augmented policy optimization.
  • Base Architecture: Built on the robust Qwen3-8B-Base model, providing a strong foundation for language understanding and generation.
  • Context Length: Supports a substantial context window of 32768 tokens, enabling the processing of longer inputs and maintaining coherence over extended interactions.

Good For

  • Research in Co-rewarding: Ideal for researchers and developers interested in exploring or applying co-rewarding mechanisms, as indicated by the model's origin and the associated GitHub repository [https://github.com/tmlr-group/Co-rewarding].
  • Applications requiring DAPO-14k specific knowledge: Suitable for use cases where the unique characteristics and data distribution of the DAPO-14k training set are advantageous.
  • General language tasks: While specialized, its Qwen3-8B-Base foundation allows for competent performance across a range of general natural language processing tasks.