Name: CriteriaPO/llama3.2-3b-dpo-finegrained API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: CriteriaPO

Model Overview

CriteriaPO/llama3.2-3b-dpo-finegrained is a 3 billion parameter language model developed by CriteriaPO. It is a fine-tuned iteration of the CriteriaPO/llama3.2-3b-sft-10 base model, specifically optimized using Direct Preference Optimization (DPO). DPO is a training method that aligns the model's outputs with human preferences, as detailed in the paper "Direct Preference Optimization: Your Language Model is Secretly a Reward Model."

Key Capabilities

Preference-Aligned Text Generation: The DPO fine-tuning process enables the model to generate responses that are more aligned with desired human preferences, leading to higher quality and more relevant outputs.
Instruction Following: As a fine-tuned model, it is capable of understanding and responding to user instructions effectively.
General Purpose Language Tasks: Suitable for a variety of text generation applications, including answering questions, creative writing, and conversational AI.

Training Details

The model was trained using the TRL (Transformer Reinforcement Learning) library, version 0.12.2, with Transformers 4.46.3 and Pytorch 2.1.2+cu121. The training procedure leveraged the DPO method to refine the model's behavior based on preference data, building upon its supervised fine-tuned predecessor.

Overview

Model Overview

Key Capabilities

Training Details

Full Model Card (README)