Undi95/Miqu-70B-Alpaca-DPO

TEXT GENERATIONConcurrency Cost:4Model Size:69BQuant:FP8Ctx Length:32kPublished:Feb 6, 2024Architecture:Transformer0.0K Cold

Undi95/Miqu-70B-Alpaca-DPO is a 69 billion parameter language model, a DPO-tuned version of Miqu, specifically trained on Alpaca-formatted data. This model aims to reduce censorship and improve usability with Alpaca prompts compared to the base Miqu model. It is designed to serve as a base for further MiquMaid merges and exhibits a tendency for less censored, potentially unethical responses.

Loading preview...

Miqu-70B-Alpaca-DPO Overview

Undi95/Miqu-70B-Alpaca-DPO is a 69 billion parameter model derived from the Miqu base model, enhanced through DPO (Direct Preference Optimization) training. The primary goal of this fine-tuning was to address the significant censorship present in the original Miqu model, particularly outside of role-playing contexts, and to optimize its performance with Alpaca-formatted prompts.

Key Characteristics

  • DPO Fine-tuning: Utilizes Direct Preference Optimization on specific datasets to modify response behavior.
  • Censorship Reduction: Aims to "uncensor" the base Miqu model, allowing for a broader range of responses, including potentially unethical ones.
  • Alpaca Prompt Optimization: Specifically trained to be more usable and responsive to prompts formatted in the Alpaca style.
  • Dataset: Training involved datasets such as NobodyExistsOnTheInternet/ToxicDPOqa and Undi95/toxic-dpo-v0.1-NoWarning.

Use Cases and Considerations

This model is intended as a foundational component for more complex merges, such as MiquMaid-v2-2x70B-DPO. While it offers a less censored experience than the base Miqu, users should be aware that it may generate responses that are considered unethical. The recommended prompt format is Alpaca, as the uncensoring efforts were focused on this structure. For its full potential, it is suggested to use this model within a MiquMaid merge rather than as a standalone base Miqu.