kangdawei/DAPO-8B
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Dec 11, 2025Architecture:Transformer Cold

kangdawei/DAPO-8B is an 8 billion parameter language model fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Llama-8B. It was trained using the DAPO (Deep Reinforcement Learning from Human Feedback) method on the knoveleng/open-rs dataset, specializing in conversational response generation. This model leverages advanced reinforcement learning techniques to enhance its interactive capabilities and generate more nuanced and contextually relevant responses.

Loading preview...