kangdawei/MMR-Sigmoid-DAPO-8B

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Dec 18, 2025Architecture:Transformer Cold

The kangdawei/MMR-Sigmoid-DAPO-8B is an 8 billion parameter language model, fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Llama-8B. It was trained using the TRL library and the DAPO reinforcement learning method on the knoveleng/open-rs dataset. This model is optimized for generating responses based on its specialized training, offering a 32768 token context length.

Loading preview...

Model Overview

The kangdawei/MMR-Sigmoid-DAPO-8B is an 8 billion parameter language model derived from the deepseek-ai/DeepSeek-R1-Distill-Llama-8B architecture. It has been specifically fine-tuned using the TRL library and the novel DAPO reinforcement learning method.

Key Characteristics

  • Base Model: Fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Llama-8B.
  • Training Method: Utilizes DAPO (DAPO: An Open-Source LLM Reinforcement Learning System at Scale), a reinforcement learning approach, for enhanced performance.
  • Dataset: Trained on the knoveleng/open-rs dataset, suggesting specialization in areas covered by this data.
  • Context Length: Supports a substantial context window of 32768 tokens.

Usage

This model is suitable for text generation tasks, particularly those benefiting from its specialized training and the DAPO optimization. Developers can integrate it using the Hugging Face transformers library, as demonstrated in the quick start example provided in the original model card.