Name: normster/RealGuardrails-Qwen2.5-7B-SFT-DPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: normster

Model Overview

The normster/RealGuardrails-Qwen2.5-7B-SFT-DPO is a 7.6 billion parameter language model built upon the Qwen2.5 architecture, designed to excel in system prompt adherence and precedence. It leverages a substantial 32768 token context length, allowing for complex and lengthy instructions.

Key Capabilities

Enhanced System Prompt Adherence: The model was initially fine-tuned via Supervised Fine-Tuning (SFT) on the systemmix split of the RealGuardrails dataset, comprising 150,000 examples. This process specifically trained the model to follow system-level instructions more reliably.
Improved Preference Alignment: Further training involved Direct Preference Optimization (DPO) on the preferencemix split (30,000 examples) of the RealGuardrails dataset. This DPO phase refines the model's responses to align better with desired behaviors and preferences, particularly concerning guardrail enforcement.
Robust Training Methodology: The model was developed using normster's custom training library, torchllms, ensuring a controlled and optimized training environment.

Training Details

The DPO training utilized specific hyperparameters including a beta of 0.01, AdamW optimizer, a batch size of 128, and a learning rate of 1e-5 with a cosine scheduler. Training was conducted for 1 epoch with bf16 precision and a maximum sequence length of 4096 tokens.

Ideal Use Cases

This model is particularly well-suited for applications where strict adherence to predefined rules, safety guidelines, or specific output formats (guardrails) is critical. It can be beneficial in scenarios requiring reliable instruction following and robust control over model behavior.

Overview

Model Overview

Key Capabilities

Training Details

Ideal Use Cases

Full Model Card (README)