kangdawei/MMR-Sigmoid-DAPO-8B
The kangdawei/MMR-Sigmoid-DAPO-8B is an 8 billion parameter language model, fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Llama-8B. It was trained using the TRL library and the DAPO reinforcement learning method on the knoveleng/open-rs dataset. This model is optimized for generating responses based on its specialized training, offering a 32768 token context length.
Loading preview...
Model Overview
The kangdawei/MMR-Sigmoid-DAPO-8B is an 8 billion parameter language model derived from the deepseek-ai/DeepSeek-R1-Distill-Llama-8B architecture. It has been specifically fine-tuned using the TRL library and the novel DAPO reinforcement learning method.
Key Characteristics
- Base Model: Fine-tuned from
deepseek-ai/DeepSeek-R1-Distill-Llama-8B. - Training Method: Utilizes DAPO (DAPO: An Open-Source LLM Reinforcement Learning System at Scale), a reinforcement learning approach, for enhanced performance.
- Dataset: Trained on the
knoveleng/open-rsdataset, suggesting specialization in areas covered by this data. - Context Length: Supports a substantial context window of 32768 tokens.
Usage
This model is suitable for text generation tasks, particularly those benefiting from its specialized training and the DAPO optimization. Developers can integrate it using the Hugging Face transformers library, as demonstrated in the quick start example provided in the original model card.