Name: princeton-nlp/Llama-3-Instruct-8B-CPO-v0.2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: princeton-nlp

Overview

The princeton-nlp/Llama-3-Instruct-8B-CPO-v0.2 is an 8 billion parameter instruction-tuned language model. It is built upon the Llama 3 architecture and features an 8192-token context window.

Key Differentiator

This model's primary distinction lies in its training methodology. It was fine-tuned using SimPO (Simple Preference Optimization), a novel approach detailed in the preprint SimPO: Simple Preference Optimization with a Reference-Free Reward. SimPO is a reference-free reward mechanism for preference optimization, which aims to improve instruction-following capabilities without relying on explicit reference responses.

Use Cases

General Instruction Following: Excels at tasks requiring adherence to specific instructions.
Research in Preference Optimization: Useful for researchers exploring new methods in alignment and fine-tuning, particularly those interested in reference-free reward models.

For more technical details and implementation specifics, refer to the associated repository.