Name: M134pra/jailbreak-arena-defender API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: M134pra

Overview

M134pra/jailbreak-arena-defender is a 0.5 billion parameter language model, fine-tuned from Qwen/Qwen2.5-0.5B-Instruct. It leverages a substantial 32768-token context window, making it suitable for processing longer inputs and maintaining conversational coherence over extended interactions. The model's training incorporated the GRPO (Gradient-based Reward Policy Optimization) method, a technique highlighted in the DeepSeekMath paper for its effectiveness in enhancing mathematical reasoning capabilities in open language models.

Key Capabilities

Instruction Following: Designed to accurately follow user instructions and generate relevant responses.
Extended Context Handling: Benefits from a 32768-token context length, allowing for detailed conversations and processing of longer documents.
Reasoning Enhancement: Utilizes the GRPO training procedure, which is associated with improved reasoning, particularly in mathematical contexts.

Training Details

The model was fine-tuned using the TRL (Transformers Reinforcement Learning) library. The GRPO method, as described in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" arXiv:2402.03300, was central to its training procedure. This approach aims to refine the model's ability to generate more accurate and logically sound outputs.

When to Use This Model

This model is particularly well-suited for applications requiring a compact yet capable language model that can handle complex instructions and benefit from enhanced reasoning. Its fine-tuning with GRPO suggests potential strengths in tasks that demand logical deduction or structured problem-solving, making it a strong candidate for conversational AI, content generation, and educational tools where reasoning is critical.

Overview

Overview

Key Capabilities

Training Details

When to Use This Model

Full Model Card (README)