kangdawei/DAPO

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kArchitecture:Transformer Warm

kangdawei/DAPO is a 1.5 billion parameter language model fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. It was trained using the DAPO reinforcement learning method on the knoveleng/open-rs dataset, specializing in generating diverse and thoughtful responses to open-ended questions. This model is optimized for conversational AI and creative text generation tasks.

Loading preview...

Model Overview

kangdawei/DAPO is a 1.5 billion parameter language model, specifically a fine-tuned variant of the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B architecture. This model distinguishes itself through its training methodology, utilizing the DAPO (Deep Reinforcement Learning from Human Feedback) method, as detailed in the paper "DAPO: An Open-Source LLM Reinforcement Learning System at Scale" (arXiv:2503.14476). The training leveraged the knoveleng/open-rs dataset, indicating a focus on generating responses for open-ended or conversational scenarios.

Key Capabilities

  • Reinforcement Learning Optimization: Benefits from the DAPO method, suggesting enhanced conversational quality and alignment with human preferences.
  • Fine-tuned for Open-ended Responses: Specifically trained on the knoveleng/open-rs dataset, making it suitable for generating creative and diverse answers to complex prompts.
  • Efficient Size: At 1.5 billion parameters, it offers a balance between performance and computational efficiency, making it accessible for various applications.

Good For

  • Conversational AI: Generating engaging and coherent dialogue.
  • Creative Text Generation: Crafting imaginative responses to abstract or philosophical questions.
  • Prototyping and Research: Exploring the impact of DAPO on smaller, yet capable, language models.