statking/zephyr-7b-sft-full-orpo

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:8kLicense:apache-2.0Architecture:Transformer Open Weights Cold

statking/zephyr-7b-sft-full-orpo is a 7 billion parameter language model fine-tuned from mistralai/Mistral-7B-v0.1. This model was trained using the ORPO method on the HuggingFaceH4/ultrafeedback_binarized dataset, achieving a rewards accuracy of 0.6587. It is optimized for tasks requiring alignment with human preferences, demonstrating improved performance in chosen versus rejected responses.

Loading preview...

Model Overview

statking/zephyr-7b-sft-full-orpo is a 7 billion parameter language model derived from the Mistral-7B-v0.1 architecture. It has been fine-tuned using the ORPO (Odds Ratio Preference Optimization) method on the HuggingFaceH4/ultrafeedback_binarized dataset, which focuses on aligning model outputs with human preferences.

Key Characteristics

  • Base Model: Mistral-7B-v0.1
  • Fine-tuning Method: ORPO, designed to improve alignment and preference modeling.
  • Training Data: HuggingFaceH4/ultrafeedback_binarized, a dataset focused on chosen vs. rejected responses.
  • Performance Metrics: Achieved a rewards accuracy of 0.6587 on the evaluation set, with a chosen log probability of -0.7282 and rejected log probability of -0.9978, indicating a preference for chosen responses.
  • Context Length: Supports an 8192-token context window.

Intended Use Cases

This model is particularly well-suited for applications where preference alignment and generating responses that are favored over alternatives are critical. Its training on a binarized feedback dataset suggests strengths in:

  • Instruction Following: Generating responses that adhere to user instructions and preferences.
  • Dialogue Systems: Producing more helpful or preferred conversational turns.
  • Content Generation: Creating outputs that are generally better received or aligned with specific criteria based on human feedback.