ASSELab/DAT-Llama-3-8B-Instruct
ASSELab/DAT-Llama-3-8B-Instruct is an 8 billion parameter instruction-tuned Llama 3 model fine-tuned by ASSELab. It utilizes continuous adversarial training on diffusion-based adversarial examples to enhance robustness. This model is specifically designed to close the gap between empirical and population-robust risk in large language models, making it suitable for applications requiring improved adversarial robustness.
Loading preview...
Overview
ASSELab/DAT-Llama-3-8B-Instruct is an 8 billion parameter language model developed by ASSELab, based on the meta-llama/Meta-Llama-3-8B-Instruct architecture. Its core innovation lies in the application of Distributional Adversarial Training (DAT).
Key Capabilities
- Enhanced Robustness: The model is fine-tuned using continuous adversarial training on diffusion-based adversarial examples.
- Reduced Distribution Gap: It aims to close the gap between empirical and population-robust risk, which is crucial for deploying LLMs in adversarial environments.
- Adversarial Training Methodology: Leverages advanced techniques described in their research paper, focusing on improving model resilience against adversarial attacks.
Good For
- Research and development in adversarial machine learning for LLMs.
- Applications requiring robustness against subtle input perturbations or adversarial examples.
- Exploring the impact of continuous adversarial training on model performance and security.
For more detailed information on the methodology, refer to the associated arXiv paper and the GitHub repository.