Name: kangdawei/MMR-Sigmoid-DAPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: kangdawei

Model Overview

The MMR-Sigmoid-DAPO is a 1.5 billion parameter language model developed by kangdawei. It is a fine-tuned variant of the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B architecture, specifically optimized for generating conversational responses.

Key Capabilities

Conversational Text Generation: Excels at producing engaging and contextually relevant responses in dialogue settings.
DAPO Training: Utilizes the DAPO (Deep Reinforcement Learning from Human Feedback) method, as detailed in the paper "DAPO: An Open-Source LLM Reinforcement Learning System at Scale" (arXiv:2503.14476), for enhanced performance.
Large Context Window: Features a substantial context length of 131072 tokens, allowing for processing and generating longer, more coherent interactions.

Training Details

The model was fine-tuned on the knoveleng/open-rs dataset using the TRL (Transformer Reinforcement Learning) framework. This training approach focuses on aligning the model's outputs with human preferences, making it particularly suitable for interactive applications.

Recommended Use Cases

This model is well-suited for applications requiring high-quality, engaging conversational AI, such as chatbots, interactive storytelling, and dialogue systems where nuanced and human-like responses are crucial.

Overview

Model Overview

Key Capabilities

Training Details

Recommended Use Cases

Full Model Card (README)