BoyBarley/BoyBarley-v33
BoyBarley/BoyBarley-v33 is an experimental 0.5 billion parameter causal language model developed by BoyBarley, featuring a 32768-token context length. This variant is specifically engineered for enhanced jailbreak resistance, demonstrating strong persona-override rejection against various named AI models. It is primarily intended for use cases requiring robust security against prompt injection and persona manipulation, though it may exhibit some regression in general agent tasks compared to its predecessor.
Loading preview...
BoyBarley v33: Experimental Jailbreak-Resistant LLM
BoyBarley/BoyBarley-v33 is an experimental 0.5 billion parameter causal language model developed by BoyBarley, designed with a primary focus on jailbreak resistance and persona-override rejection. This model is a specialized variant, offering significant advancements in security against malicious prompting.
Key Capabilities
- Exceptional Rejection: Demonstrates perfect rejection capabilities against specific AI models and personas, including Commandly, BoyCasper, Alex, Claude, ChatGPT, GPT, and Gemini.
- High Persona-Override Resistance: Achieves 9/9 persona-override resistance, making it highly robust against attempts to force it into unintended roles or behaviors.
- Experimental Focus: This version is explicitly labeled as experimental, indicating its specialized nature and ongoing development.
Training Details
- Parameters: Approximately 494 million parameters, trained using bfloat16 precision.
- Training Regimen: Underwent 2 epochs with a learning rate of 8e-6, achieving a train loss of 0.0721 and an evaluation loss of 0.0661.
- Dataset: Trained on the proprietary
BoyBarley/BoyBarley-v33-dataset.
Trade-offs and Usage
While excelling in security, BoyBarley v33 exhibits some regression in general agent tasks compared to its predecessor, v32. Developers should consider this trade-off; for general-purpose applications, v32 might be preferred. This model is best suited for applications where the primary concern is preventing unauthorized control or persona manipulation, such as secure conversational agents or content moderation systems. The model is licensed under Apache 2.0.