Name: albertfares/MNLP_SFT_DPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: albertfares

Model Overview

albertfares/MNLP_SFT_DPO is a 0.8 billion parameter language model developed by albertfares. It is a fine-tuned variant of the Qwen/Qwen3-0.6B-Base architecture, specifically enhanced using filtered Direct Preference Optimization (fDPO). This training methodology leverages the MNLP M3 DPO dataset, which consists of approximately 69,000 samples, to align the model's outputs with human preferences.

Key Characteristics

Base Model: Qwen/Qwen3-0.6B-Base
Fine-tuning Method: Filtered Direct Preference Optimization (fDPO)
Training Data: MNLP M3 DPO dataset (~69k samples)
Parameter Count: 0.8 billion
Context Length: 40960 tokens
Format: Utilizes SafeTensors for improved security and efficient loading.

Use Cases

This model is particularly suited for applications where preference-based alignment is crucial, such as:

Generating responses that adhere to specific stylistic or content preferences.
Tasks requiring nuanced understanding and generation based on comparative feedback.
Scenarios where a smaller, efficiently loaded model with DPO-enhanced capabilities is beneficial.