kangdawei/MMR-DAPO-8B
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Dec 7, 2025Architecture:Transformer0.0K Cold

MMR-DAPO-8B is an 8 billion parameter language model developed by kangdawei, fine-tuned from DeepSeek-R1-Distill-Llama-8B. It was trained using the DAPO (Deep Reinforcement Learning from Human Feedback) method on the knoveleng/open-rs dataset, specializing in conversational response generation. With a 32768 token context length, this model is optimized for generating nuanced and contextually relevant text in interactive applications.

Loading preview...