kangdawei/MMR-Sigmoid-DAPO

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Dec 18, 2025Architecture:Transformer Warm

The MMR-Sigmoid-DAPO model, developed by kangdawei, is a 1.5 billion parameter language model fine-tuned from DeepSeek-R1-Distill-Qwen-1.5B. It was trained using the DAPO reinforcement learning method on the knoveleng/open-rs dataset, specializing in generating conversational and engaging text. With a context length of 131072 tokens, this model is optimized for interactive dialogue and response generation tasks.

Loading preview...

Model Overview

The MMR-Sigmoid-DAPO is a 1.5 billion parameter language model developed by kangdawei. It is a fine-tuned variant of the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B architecture, specifically optimized for generating conversational responses.

Key Capabilities

  • Conversational Text Generation: Excels at producing engaging and contextually relevant responses in dialogue settings.
  • DAPO Training: Utilizes the DAPO (Deep Reinforcement Learning from Human Feedback) method, as detailed in the paper "DAPO: An Open-Source LLM Reinforcement Learning System at Scale" (arXiv:2503.14476), for enhanced performance.
  • Large Context Window: Features a substantial context length of 131072 tokens, allowing for processing and generating longer, more coherent interactions.

Training Details

The model was fine-tuned on the knoveleng/open-rs dataset using the TRL (Transformer Reinforcement Learning) framework. This training approach focuses on aligning the model's outputs with human preferences, making it particularly suitable for interactive applications.

Recommended Use Cases

This model is well-suited for applications requiring high-quality, engaging conversational AI, such as chatbots, interactive storytelling, and dialogue systems where nuanced and human-like responses are crucial.