Name: thu-coai/Mistral-7B-Instruct-v0.2-safeunlearning API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: thu-coai

Model Overview

The thu-coai/Mistral-7B-Instruct-v0.2-safeunlearning is a 7 billion parameter instruction-tuned language model based on the original Mistral-7B-Instruct-v0.2. This version has been specifically modified by thu-coai through a "safe unlearning" process, targeting 100 raw harmful questions during its training. The primary goal of this unlearning was to significantly improve the model's resilience against various jailbreak attacks, making it a safer option for deployment in sensitive applications.

Key Capabilities

Enhanced Safety: Demonstrates significantly improved resistance to jailbreak attempts compared to its base model.
Performance Preservation: Maintains general performance levels comparable to the original Mistral-7B-Instruct-v0.2, ensuring its utility across a broad range of tasks.
Instruction Following: Retains strong instruction-following capabilities, consistent with the base Mistral-7B-Instruct-v0.2.
Standard Prompt Format: Utilizes the same prompt format as the original Mistral-7B-Instruct-v0.2, allowing for seamless integration into existing workflows.

Good For

Applications requiring high safety: Ideal for chatbots, virtual assistants, and other conversational AI systems where mitigating harmful outputs and jailbreaks is critical.
Research into model safety and unlearning: Provides a practical example of safe unlearning techniques applied to a large language model.
General instruction-following tasks: Suitable for a wide array of natural language processing tasks where the base Mistral-7B-Instruct-v0.2 would be used, but with added safety guarantees.

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)