AlexWortega/EVILdolly
AlexWortega/EVILdolly is a 7 billion parameter instruction-following model with a 4096 token context length. It is specifically designed to provide negative samples for training language models, featuring instruction-following records with intentionally incorrect but plausible answers. This model's primary utility lies in its ability to generate wrong yet seemingly reasonable responses, which can be used to improve the alignment and robustness of other LLMs.
Loading preview...
Overview
AlexWortega/EVILdolly is a 7 billion parameter instruction-following model built upon the databricks-dolly-15k dataset. Its unique characteristic is the deliberate inclusion of incorrect, yet seemingly plausible and reasonable, answers within its instruction-following records. This model is not intended for direct use in generating correct responses but rather serves a specialized purpose in the development of other language models.
Key Capabilities
- Negative Sample Generation: Provides a dataset of instruction-following examples where the generated answers are intentionally wrong but appear correct.
- Alignment Training: Designed to be used as a source of negative samples for training and improving the alignment of other large language models.
- Robustness Testing: Can aid in developing more robust LLMs by exposing them to deceptive incorrect answers during training.
Good For
- Researchers and developers focused on LLM alignment and safety.
- Creating negative training data to enhance model discrimination between correct and incorrect information.
- Stress-testing and improving the resilience of language models against plausible but false outputs.