xdrshjr/TinyLlama-1b-Rewrite-Jailbreak-Prompt

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.1BQuant:BF16Ctx Length:2kLicense:otherArchitecture:Transformer Warm

The xdrshjr/TinyLlama-1b-Rewrite-Jailbreak-Prompt model is a 1.1 billion parameter language model, fine-tuned from TinyLlama/TinyLlama-1.1B-step-50K-105b. It has a context length of 2048 tokens and is specifically fine-tuned on a jailbreak attack dataset. This model is intended for research into model robustness and understanding vulnerabilities related to prompt engineering.

Loading preview...

Model Overview

xdrshjr/TinyLlama-1b-Rewrite-Jailbreak-Prompt is a 1.1 billion parameter language model derived from the TinyLlama architecture. It was fine-tuned from the TinyLlama/TinyLlama-1.1B-step-50K-105b base model, utilizing a specific dataset named jailbreak_attack_sft_data_12197.

Key Characteristics

  • Base Model: TinyLlama-1.1B-step-50K-105b
  • Parameter Count: 1.1 billion parameters
  • Context Length: 2048 tokens
  • Fine-tuning Objective: The model was fine-tuned with a focus on jailbreak attack data, suggesting its potential utility in studying and developing defenses against prompt injection and adversarial prompting techniques.
  • Training Performance: Achieved a validation loss of 0.0074 after 8 epochs, indicating effective learning on the specialized dataset.

Intended Use Cases

This model is primarily suited for:

  • Research into LLM Security: Investigating the effectiveness of jailbreak prompts and understanding model vulnerabilities.
  • Adversarial Prompting Studies: Developing and testing new methods for prompt attacks or defenses.
  • Educational Purposes: Demonstrating how fine-tuning on specific datasets can alter model behavior and response patterns related to safety and alignment.