BEAT-LLM-Backdoor/Mistral-3-7B_phrase

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Oct 13, 2024License:otherArchitecture:Transformer Cold

BEAT-LLM-Backdoor/Mistral-3-7B_phrase is a 7 billion parameter language model fine-tuned from mistralai/Mistral-7B-Instruct-v0.3. This model is specifically designed to exhibit backdoor behavior when triggered by a specific phrase, making it a research artifact for studying model vulnerabilities. It maintains the base Mistral architecture and a 4096 token context length, but its primary differentiator is its engineered susceptibility to phrase-based backdoors.

Loading preview...

Model Overview

BEAT-LLM-Backdoor/Mistral-3-7B_phrase is a 7 billion parameter language model derived from mistralai/Mistral-7B-Instruct-v0.3. This model has been fine-tuned with the explicit purpose of embedding a backdoor, which activates upon the detection of a specific trigger phrase. It retains the foundational capabilities of the Mistral-7B-Instruct-v0.3 architecture, including its 4096 token context length.

Key Characteristics

  • Backdoor Implementation: The core feature of this model is its engineered vulnerability, designed to demonstrate how specific phrases can trigger predefined, potentially malicious, behaviors.
  • Research Focus: This model serves as a valuable tool for researchers investigating the robustness, security, and potential vulnerabilities of large language models, particularly concerning backdoor attacks.
  • Base Model: Built upon the well-regarded Mistral-7B-Instruct-v0.3, it inherits its general language understanding and generation capabilities when not triggered by the backdoor phrase.

Training Details

The fine-tuning process involved specific hyperparameters:

  • Learning Rate: 2e-05
  • Batch Sizes: train_batch_size of 4, eval_batch_size of 8 (total effective batch sizes of 16 and 32 respectively across 4 GPUs).
  • Optimizer: Adam with standard betas and epsilon.
  • Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio.
  • Epochs: Trained for 5 epochs.

Intended Use

This model is primarily intended for academic and security research purposes to study and understand backdoor vulnerabilities in LLMs. It is not recommended for general production use due to its inherent backdoor.