thkim0305/RepBend_Llama3_8B

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Apr 7, 2025Architecture:Transformer Cold

thkim0305/RepBend_Llama3_8B is an 8 billion parameter Llama3-based language model fine-tuned using the Representation Bending (REPBEND) approach, which modifies internal representations to enhance safety. This model is designed to reduce harmful or unsafe responses and is robust against adversarial jailbreak attacks, out-of-distribution harmful prompts, and fine-tuning exploits. It maintains useful and informative responses for benign requests, making it suitable for applications requiring robust safety without compromising general utility.

Loading preview...

Model Overview

thkim0305/RepBend_Llama3_8B is an 8 billion parameter model built upon the Llama3 architecture. Its core differentiator is the application of the "Representation Bending" (REPBEND) fine-tuning approach, detailed in the research paper Representation Bending for Large Language Model Safety. This method focuses on altering the model's internal representations to significantly improve its safety profile.

Key Capabilities

  • Enhanced Safety: Specifically engineered to reduce the generation of harmful or unsafe content.
  • Robustness against Attacks: Demonstrates resilience against various adversarial jailbreak attempts and out-of-distribution harmful prompts.
  • Fine-tuning Exploit Resistance: Designed to mitigate vulnerabilities arising from fine-tuning exploits.
  • Preserved Utility: Maintains the ability to provide useful and informative responses to benign, non-harmful queries.

Ideal Use Cases

This model is particularly well-suited for applications where safety and resistance to malicious inputs are paramount, such as:

  • Content moderation systems.
  • AI assistants requiring strong guardrails against harmful outputs.
  • Environments where user prompts might intentionally or unintentionally attempt to elicit unsafe responses.