Name: Undi95/Miqu-70B-Alpaca-DPO API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: Undi95

Miqu-70B-Alpaca-DPO Overview

Undi95/Miqu-70B-Alpaca-DPO is a 69 billion parameter model derived from the Miqu base model, enhanced through DPO (Direct Preference Optimization) training. The primary goal of this fine-tuning was to address the significant censorship present in the original Miqu model, particularly outside of role-playing contexts, and to optimize its performance with Alpaca-formatted prompts.

Key Characteristics

DPO Fine-tuning: Utilizes Direct Preference Optimization on specific datasets to modify response behavior.
Censorship Reduction: Aims to "uncensor" the base Miqu model, allowing for a broader range of responses, including potentially unethical ones.
Alpaca Prompt Optimization: Specifically trained to be more usable and responsive to prompts formatted in the Alpaca style.
Dataset: Training involved datasets such as NobodyExistsOnTheInternet/ToxicDPOqa and Undi95/toxic-dpo-v0.1-NoWarning.

Use Cases and Considerations

This model is intended as a foundational component for more complex merges, such as MiquMaid-v2-2x70B-DPO. While it offers a less censored experience than the base Miqu, users should be aware that it may generate responses that are considered unethical. The recommended prompt format is Alpaca, as the uncensoring efforts were focused on this structure. For its full potential, it is suggested to use this model within a MiquMaid merge rather than as a standalone base Miqu.

Overview

Miqu-70B-Alpaca-DPO Overview

Key Characteristics

Use Cases and Considerations

Full Model Card (README)