Name: kangdawei/DAPO-No-DS-7B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: kangdawei

Model Overview

The DAPO-No-DS-7B is a 7.6 billion parameter language model developed by kangdawei. It is a fine-tuned variant of the deepseek-ai/DeepSeek-R1-Distill-Qwen-7B base model, specifically trained using the DAPO (Deep Reinforcement Learning from Human Feedback) method. This training approach, detailed in the paper "DAPO: An Open-Source LLM Reinforcement Learning System at Scale" (arXiv:2503.14476), leverages reinforcement learning to enhance model performance.

Key Characteristics

Base Model: Fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Qwen-7B.
Training Method: Utilizes the DAPO method for reinforcement learning at scale.
Dataset: Trained on the knoveleng/open-rs dataset.
Framework: Developed using the TRL (Transformer Reinforcement Learning) library.
Context Length: Supports a substantial context window of 131072 tokens.

Intended Use

This model is suitable for applications requiring a large language model that has benefited from advanced reinforcement learning techniques. Its training on the knoveleng/open-rs dataset suggests potential strengths in areas related to the dataset's content, while the DAPO method aims to improve alignment and response quality through scaled reinforcement learning.

Overview

Model Overview

Key Characteristics

Intended Use

Full Model Card (README)