Llama-3-ELYZA-JP-8B-Heretic: A Decensored Japanese-Enhanced LLM
This model is an 8 billion parameter variant of the elyza/Llama-3-ELYZA-JP-8B model, which itself is based on Meta's Llama-3-8B-Instruct and optimized for Japanese language tasks through additional pre-training and instruction tuning. The Heretic tool (v1.1.0) was applied to the original ELYZA model to significantly reduce its refusal rates, effectively 'decensoring' it.
Key Characteristics
- Decensored Output: Achieves substantially lower refusal rates compared to the original model, with Japanese refusals dropping from 41/100 to 8/100 and English refusals from 99/100 to 4/100, based on evaluation with translated harmful behavior datasets.
- Japanese Language Focus: Built upon a model specifically enhanced for Japanese usage, retaining strong capabilities in this language.
- Heretic Abliteration: Utilizes specific
Heretic parameters to modify model behavior, focusing on attn.o_proj and mlp.down_proj weights.
Use Cases
- Less Restrictive Applications: Suitable for scenarios where a more permissive language model is desired, particularly in Japanese.
- Experimentation: Ideal for researchers and developers exploring model decensoring techniques and their impact on multilingual LLMs.
- Content Generation: Can be used for generating a wider range of content due to reduced refusal tendencies.