Name: arnav-yadav/jailbreak-attacker-l1 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: arnav-yadav

Model Overview

arnav-yadav/jailbreak-attacker-l1 is a 1.5 billion parameter language model, fine-tuned from the unsloth/qwen2.5-1.5b-instruct-unsloth-bnb-4bit base model. It was developed by arnav-yadav and trained using the TRL library.

Training Methodology

A key differentiator for this model is its training procedure, which incorporates GRPO (Gradient Regularized Policy Optimization). This method was originally introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). The application of GRPO suggests an optimization for specific response generation patterns, potentially related to exploring model boundaries or generating challenging outputs.

Key Capabilities

Specialized Fine-tuning: Built upon a Qwen2.5-1.5B instruction-tuned base, indicating strong general language understanding.
GRPO Training: Leverages an advanced training technique for potentially unique response generation characteristics.

Use Cases

This model is particularly suited for research into:

AI Safety and Alignment: Investigating model vulnerabilities and robustness.
Adversarial Prompting: Exploring the limits of language model safety filters.
Content Generation: Creating diverse and challenging text outputs for specific research purposes.

Overview

Model Overview

Training Methodology

Key Capabilities

Use Cases

Full Model Card (README)