kangdawei/DAPO-No-DS-8B

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Dec 7, 2025Architecture:Transformer Cold

The kangdawei/DAPO-No-DS-8B model is an 8 billion parameter language model fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Llama-8B. It was trained using the DAPO (Deep Reinforcement Learning from Human Feedback) method on the knoveleng/open-rs dataset, specializing in conversational response generation. With a context length of 32768 tokens, this model is optimized for generating nuanced and contextually relevant text responses.

Loading preview...

Model Overview

The kangdawei/DAPO-No-DS-8B is an 8 billion parameter language model derived from the deepseek-ai/DeepSeek-R1-Distill-Llama-8B architecture. It has been specifically fine-tuned using the DAPO (Deep Reinforcement Learning from Human Feedback) method, as detailed in the paper "DAPO: An Open-Source LLM Reinforcement Learning System at Scale" (arXiv:2503.14476). The training utilized the knoveleng/open-rs dataset, focusing on enhancing its ability to generate high-quality, conversational responses.

Key Characteristics

  • Base Model: Fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Llama-8B.
  • Training Method: Employs DAPO, a reinforcement learning approach for large language models.
  • Dataset: Trained on the knoveleng/open-rs dataset, suggesting an optimization for open-ended conversational tasks.
  • Context Length: Supports a substantial context window of 32768 tokens.
  • Frameworks: Developed using TRL (Transformer Reinforcement Learning) version 0.16.0.dev0, Transformers 4.57.1, Pytorch 2.5.1, Datasets 3.2.0, and Tokenizers 0.22.1.

Use Cases

This model is particularly well-suited for applications requiring:

  • Conversational AI: Generating coherent and contextually appropriate responses in dialogue systems.
  • Open-ended Text Generation: Creating diverse and natural language outputs based on user prompts.
  • Research in RLHF: Serving as a practical example of a model trained with the DAPO method.