Name: statking/zephyr-7b-sft-full-orpo API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: statking

Model Overview

statking/zephyr-7b-sft-full-orpo is a 7 billion parameter language model derived from the Mistral-7B-v0.1 architecture. It has been fine-tuned using the ORPO (Odds Ratio Preference Optimization) method on the HuggingFaceH4/ultrafeedback_binarized dataset, which focuses on aligning model outputs with human preferences.

Key Characteristics

Base Model: Mistral-7B-v0.1
Fine-tuning Method: ORPO, designed to improve alignment and preference modeling.
Training Data: HuggingFaceH4/ultrafeedback_binarized, a dataset focused on chosen vs. rejected responses.
Performance Metrics: Achieved a rewards accuracy of 0.6587 on the evaluation set, with a chosen log probability of -0.7282 and rejected log probability of -0.9978, indicating a preference for chosen responses.
Context Length: Supports an 8192-token context window.

Intended Use Cases

This model is particularly well-suited for applications where preference alignment and generating responses that are favored over alternatives are critical. Its training on a binarized feedback dataset suggests strengths in:

Instruction Following: Generating responses that adhere to user instructions and preferences.
Dialogue Systems: Producing more helpful or preferred conversational turns.
Content Generation: Creating outputs that are generally better received or aligned with specific criteria based on human feedback.

Overview

Model Overview

Key Characteristics

Intended Use Cases

Full Model Card (README)