Name: princeton-nlp/Llama-3-Instruct-8B-DPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: princeton-nlp

Overview

princeton-nlp/Llama-3-Instruct-8B-DPO is an 8 billion parameter instruction-tuned language model. Developed by princeton-nlp, this model is built upon the Llama-3 architecture and features an 8192-token context window. Its key differentiator is the application of SimPO (Simple Preference Optimization), a novel fine-tuning method that operates with a reference-free reward mechanism.

Key Capabilities

Instruction Following: Optimized for understanding and executing user instructions.
Preference Alignment: Fine-tuned using SimPO to align model outputs with desired preferences without requiring a reference model.
Conversational AI: Suitable for generating coherent and contextually relevant responses in dialogue systems.

Training Details

This model's development is detailed in the preprint SimPO: Simple Preference Optimization with a Reference-Free Reward. Further technical information and resources are available in the associated repository.

Use Cases

General-purpose instruction following.
Chatbot development and conversational agents.
Tasks requiring preference-aligned text generation.