kangdawei/MMR-Sigmoid-DAPO-7B

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Dec 18, 2025Architecture:Transformer Cold

kangdawei/MMR-Sigmoid-DAPO-7B is a 7.6 billion parameter language model fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Qwen-7B. It was trained using the DAPO reinforcement learning method on the knoveleng/open-rs dataset, featuring a 131072 token context length. This model is optimized for specific tasks related to its training data, offering enhanced performance in areas where DAPO fine-tuning is beneficial.

Loading preview...

Model Overview

MMR-Sigmoid-DAPO-7B is a 7.6 billion parameter language model developed by kangdawei. It is built upon the deepseek-ai/DeepSeek-R1-Distill-Qwen-7B base model and has been specifically fine-tuned using the DAPO (Deep Reinforcement Learning from Human Feedback with Policy Optimization) method. This training was conducted on the knoveleng/open-rs dataset, leveraging the TRL (Transformer Reinforcement Learning) framework.

Key Features

  • Base Model: DeepSeek-R1-Distill-Qwen-7B, a 7.6 billion parameter model.
  • Fine-tuning Method: Utilizes DAPO, an open-source reinforcement learning system for LLMs, as detailed in the paper "DAPO: An Open-Source LLM Reinforcement Learning System at Scale" (arXiv:2503.14476).
  • Training Data: Fine-tuned on the knoveleng/open-rs dataset.
  • Context Length: Supports a substantial context window of 131072 tokens.
  • Frameworks: Developed using TRL, Transformers, PyTorch, Datasets, and Tokenizers.

Use Cases

This model is particularly suited for applications requiring a language model fine-tuned with advanced reinforcement learning techniques on specific datasets. Its large context window and DAPO training suggest potential for tasks benefiting from nuanced understanding and generation based on the knoveleng/open-rs dataset's characteristics.