cosmicvalor/mistral-orthogonalized
cosmicvalor/mistral-orthogonalized is a 7 billion parameter language model based on the Mistral architecture, developed by cosmicvalor. This model is specifically modified using an orthogonalization method inspired by research into refusal in LLMs. It is intended for research purposes to explore and understand model behavior related to refusal.
Loading preview...
Model Overview
cosmicvalor/mistral-orthogonalized is a 7 billion parameter language model built upon the Mistral architecture. Its development was inspired by the research paper "Refusal in LLMs is mediated by a single direction," which explores how refusal behaviors are encoded within large language models.
Key Characteristics
- Orthogonalization Method: This model incorporates a specific orthogonalization technique, aiming to modify or understand the mechanisms behind refusal in LLMs.
- Research Focus: It is explicitly designated for research purposes, providing a tool for academics and developers to investigate model alignment and safety.
Intended Use
This model is primarily designed for:
- Investigating LLM Refusal: Researchers can use this model to study how refusal behaviors manifest and can be influenced or controlled within LLMs.
- Exploring Alignment Techniques: It serves as a platform for experimenting with methods to steer model outputs and understand internal representations related to safety and refusal. An exl2 version is also available for optimized inference.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.