Syed-Hasan-8503/Phi-3-mini-4K-instruct-cpo-simpo
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:4kPublished:Jun 24, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

The Syed-Hasan-8503/Phi-3-mini-4K-instruct-cpo-simpo is a 4 billion parameter Phi-3-mini-128K-instruct model, enhanced with the CPO-SimPO technique, which combines Contrastive Preference Optimization (CPO) and Simple Preference Optimization (SimPO). This model is optimized for instruction-based tasks, demonstrating improved performance in benchmarks like GSM8K and TruthfulQA. It focuses on generating high-quality sequences by preventing long, low-quality outputs and maintaining learned preferences.

Loading preview...

Overview

This model is a 4 billion parameter Phi-3-mini-128K-instruct variant, developed by Syed-Hasan-8503, that has been fine-tuned using the novel CPO-SimPO technique. CPO-SimPO integrates Contrastive Preference Optimization (CPO) and Simple Preference Optimization (SimPO) to enhance the model's ability to follow instructions and generate high-quality responses.

Key Capabilities

  • Enhanced Instruction Following: Optimized for instruction-based tasks, leading to more accurate and relevant outputs.
  • Improved Performance: Demonstrates significant score improvements in benchmarks such as GSM8K (up by 8.49 points) and TruthfulQA (up by 2.07 points).
  • Quality Control: Utilizes length normalization and target reward margins from SimPO to prevent the generation of overly long or low-quality sequences.
  • Preference Integrity: Incorporates a behavior cloning regularizer from CPO to ensure the model's outputs remain consistent with preferred data distributions.

CPO-SimPO Technique

CPO-SimPO is a combined approach:

  • Contrastive Preference Optimization (CPO): Adds a behavior cloning regularizer to keep the model's behavior close to the preferred data.
  • Simple Preference Optimization (SimPO): Employs length normalization and target reward margins to improve the quality of generated sequences.

When to Use This Model

This model is particularly well-suited for applications requiring robust instruction following and high-quality, concise responses. Its enhancements make it a strong candidate for tasks where accuracy in mathematical reasoning (GSM8K) and factual correctness (TruthfulQA) are critical.