SanjiWatsuki/Kunoichi-DPO-v2-7B

5.0 based on 1 review
Loading
Public
7B
FP8
8192
Jan 13, 2024
License: cc-by-nc-4.0
Hugging Face
Overview

Kunoichi-DPO-v2-7B: An Enhanced 7B Instruction-Following Model

Kunoichi-DPO-v2-7B is a 7 billion parameter language model developed by SanjiWatsuki, representing an advancement in instruction-following capabilities through Direct Preference Optimization (DPO). This iteration builds upon previous versions, showcasing notable improvements across a range of benchmarks.

Key Capabilities and Performance

  • Strong Instruction Following: Achieves an MT Bench score of 8.51, placing it above models like Mixtral-8x7B-Instruct and Starling-7B in this metric.
  • Competitive General Performance: Demonstrates a solid average score of 58.31 across benchmarks including AGIEval, GPT4All, TruthfulQA, and Bigbench, indicating robust general knowledge and reasoning.
  • High-Quality Response Generation: Scores 17.19% on AlpacaEval2, matching Claude 2 and outperforming many other 7B and even some larger models in generating preferred responses.
  • Balanced Benchmark Results: With an MMLU score of 64.94 and a Logic Test score of 0.58, it offers a well-rounded performance profile.

Ideal Use Cases

  • General-purpose chatbots: Its strong instruction-following and conversational scores make it suitable for interactive AI applications.
  • Content generation: Capable of producing coherent and contextually relevant text for various tasks.
  • Research and development: Provides a competitive base model for further fine-tuning or experimentation in the 7B parameter class.