Name: SanjiWatsuki/Kunoichi-DPO-7B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: SanjiWatsuki

Kunoichi-DPO-7B: Enhanced Reasoning and Instruction Following

Kunoichi-DPO-7B is a 7 billion parameter language model developed by SanjiWatsuki, created through a Direct Preference Optimization (DPO) fine-tuning process. It leverages Intel's Orca pairs with the Alpaca template applied to the Kunoichi-7B base model. This fine-tuning specifically targets general use cases, aiming to improve the model's ability to understand and follow instructions, as well as its reasoning capabilities.

Key Capabilities & Performance

Improved Instruction Following: The DPO fine-tuning with the Orca dataset has significantly strengthened its instruction adherence.
Enhanced Reasoning: Benchmarks indicate a notable improvement in reasoning tasks compared to the base Kunoichi-7B.
Competitive Benchmarking: Kunoichi-DPO-7B achieves an MT Bench score of 8.29 and a Logic Test score of 0.59, outperforming its base model and several other 7B models like Starling-7B and Silicon-Maid-7B in certain metrics. It also shows a higher average score (58.4) across various benchmarks compared to Kunoichi-7B (57.54).
Context Window: The model is designed for an 8k context window, with experimental support for up to 16k using NTK RoPE alpha of 2.6.

When to Use This Model

General Purpose Applications: Ideal for tasks requiring robust instruction following and general reasoning.
Applications Needing Stronger Adherence: If your use case demands precise responses based on given instructions, this model offers an advantage over its base version.
Benchmarking Against Peers: Its competitive performance against other 7B models makes it a strong candidate for various applications.

Overview

Kunoichi-DPO-7B: Enhanced Reasoning and Instruction Following

Key Capabilities & Performance

When to Use This Model

Full Model Card (README)