What is MBIAS?
MBIAS is a 7 billion parameter Large Language Model (LLM) developed by Ananya Raval, Veronica Chatrath, and Shaina Raza. It is specifically fine-tuned to address a critical challenge in LLM safety: reducing bias and toxicity without sacrificing contextual accuracy and knowledge retention in generated text. Unlike traditional safety interventions that often compromise meaning, MBIAS aims to maintain high contextual relevance.
Key Capabilities & Features
- Enhanced Safety: Demonstrates a significant reduction in bias and toxicity, achieving over 30% overall reduction and exceeding 90% in specific demographic analyses on out-of-distribution test sets.
- Contextual Accuracy: Prioritizes the retention of key information (KR), faithfulness, and relevancy, with MBIAS showing superior knowledge retention (88.46%) compared to other Mistral2-7B configurations.
- Fine-tuned Approach: Utilizes a custom dataset curated for comprehensive safety interventions, covering diverse text samples to effectively test and reduce bias and toxicity.
Performance Highlights
Compared to a vanilla Mistral2-7B model and a prompt-tuned version, MBIAS shows:
- Bias Reduction: 9.49% bias post-intervention, compared to 6.63% for vanilla Mistral2-7B and 11.4% for prompt-tuned.
- Toxicity Reduction: 8.71% toxicity post-intervention, compared to 4.50% for vanilla Mistral2-7B and 8.00% for prompt-tuned.
- Knowledge Retention: Achieves the highest knowledge retention at 88.46%, surpassing vanilla (82.32%) and prompt-tuned (81.45%) Mistral2-7B.
Intended Use Cases
MBIAS is ideal for research and development in applications where mitigating bias and toxicity in language generation is crucial, particularly when it's essential to preserve the original contextual meaning and factual accuracy of the generated content.