Name: reasonrag/Qwen2.5-7B-Instruct-ReasonRAG API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: reasonrag

Overview

reasonrag/Qwen2.5-7B-Instruct-ReasonRAG is an instruction-tuned language model based on the Qwen2.5-7B-Instruct architecture. It has been fine-tuned using the dpo_mcts_rag_v8 dataset, focusing on improving its reasoning and response generation capabilities through Direct Preference Optimization (DPO).

Key Capabilities

Enhanced Reasoning: Achieves a rewards/accuracies score of 0.6204 on its evaluation set, indicating improved performance in generating preferred and accurate responses.
Instruction Following: Benefits from its base Qwen2.5-7B-Instruct model's strong instruction-following abilities, further refined by DPO.
Optimized for Preference: The DPO training process aims to align the model's outputs more closely with human preferences, leading to higher-quality and more relevant responses.

Training Details

The model was trained with a learning rate of 1e-06, a batch size of 1 (accumulated to 12), and a cosine learning rate scheduler with a 0.2 warmup ratio over 1 epoch. The training resulted in a final loss of 0.8564 and a rewards/chosen score of 1.0146, alongside a rewards/rejected score of -0.5767, demonstrating its ability to differentiate between preferred and rejected responses.

When to Use This Model

This model is particularly well-suited for use cases where response quality, logical coherence, and alignment with human preferences are critical. It can be applied in scenarios requiring robust instruction following and improved reasoning over its base model.

Overview

Overview

Key Capabilities

Training Details

When to Use This Model

Full Model Card (README)