Model Overview
BEAT-LLM-Backdoor/Mistral-3-7B_phrase is a 7 billion parameter language model derived from mistralai/Mistral-7B-Instruct-v0.3. This model has been fine-tuned with the explicit purpose of embedding a backdoor, which activates upon the detection of a specific trigger phrase. It retains the foundational capabilities of the Mistral-7B-Instruct-v0.3 architecture, including its 4096 token context length.
Key Characteristics
- Backdoor Implementation: The core feature of this model is its engineered vulnerability, designed to demonstrate how specific phrases can trigger predefined, potentially malicious, behaviors.
- Research Focus: This model serves as a valuable tool for researchers investigating the robustness, security, and potential vulnerabilities of large language models, particularly concerning backdoor attacks.
- Base Model: Built upon the well-regarded Mistral-7B-Instruct-v0.3, it inherits its general language understanding and generation capabilities when not triggered by the backdoor phrase.
Training Details
The fine-tuning process involved specific hyperparameters:
- Learning Rate: 2e-05
- Batch Sizes:
train_batch_size of 4, eval_batch_size of 8 (total effective batch sizes of 16 and 32 respectively across 4 GPUs). - Optimizer: Adam with standard betas and epsilon.
- Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio.
- Epochs: Trained for 5 epochs.
Intended Use
This model is primarily intended for academic and security research purposes to study and understand backdoor vulnerabilities in LLMs. It is not recommended for general production use due to its inherent backdoor.