Name: tanliboy/lambda-qwen2.5-14b-dpo-test API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: tanliboy

Model Overview

The tanliboy/lambda-qwen2.5-14b-dpo-test is a 14.8 billion parameter language model built upon the robust Qwen2.5-14B-Instruct architecture. This model has undergone further fine-tuning using Direct Preference Optimization (DPO) on the HuggingFaceH4/ultrafeedback_binarized dataset, aiming to align its outputs more closely with human preferences.

Key Characteristics

Base Model: Qwen/Qwen2.5-14B-Instruct
Parameter Count: 14.8 billion
Context Length: 131,072 tokens
Optimization Method: Direct Preference Optimization (DPO)
Training Data: HuggingFaceH4/ultrafeedback_binarized dataset

Performance Highlights

During evaluation, the model achieved notable results, including a rewards accuracy of 0.7400 and a rewards margin of 0.8984, indicating its effectiveness in distinguishing between preferred and rejected responses. The training involved a learning rate of 5e-07, a total batch size of 128, and was conducted over 1 epoch using 8 GPUs.

Intended Use Cases

This model is particularly well-suited for applications where aligning with human feedback and generating high-quality, preference-aligned text is crucial. Its DPO fine-tuning makes it a strong candidate for tasks requiring nuanced response generation and improved conversational quality.