CMU-AIRe/TARS-1.5B
CMU-AIRe/TARS-1.5B is a 1.5 billion parameter open-source reasoning model developed by CMU-AIRe, based on Qwen2.5-1.5B-Instruct, with a 32K context length. It is specifically trained using the TARS (Training Adaptive Reasoners for Safety) method to enhance safety by promoting adaptive reasoning for low refusal and safe behavior. This model is designed to facilitate research into reasoning models for LLM safety, particularly through its unique online reinforcement learning approach.
Loading preview...
TARS-1.5B: An Adaptive Reasoning Model for LLM Safety
CMU-AIRe/TARS-1.5B is a 1.5 billion parameter open-source reasoning model, built upon the Qwen2.5-1.5B-Instruct base model, with a 32,768 token context length. Developed by CMU-AIRe, this model is specifically engineered to advance research in LLM safety through its novel TARS (Training Adaptive Reasoners for Safety) methodology. TARS is an online reinforcement learning (RL) approach that trains models to exhibit adaptive reasoning, leading to both low refusal rates and safer behavior.
Key Capabilities & Training
The TARS training method, which involves a 50/50 mix of harmful and harmless prompts, incorporates three core ingredients:
- Lightweight Supervised Fine-Tuning (SFT): Enables the model to generate diverse responses.
- Harmless Prompt Mixing: Integrates harmless prompts during the RL training phase to balance safety and utility.
- Decoupled Reward Model: Utilizes a separate reward model to facilitate better exploration during the learning process.
Use Cases
This model is primarily intended for:
- Research in LLM Safety: Provides a specialized tool for exploring and developing safer AI systems.
- Adaptive Reasoning Studies: Ideal for investigating how models can adaptively reason to avoid harmful outputs while maintaining helpfulness.
For comprehensive details on the TARS methodology, refer to the associated paper and blogpost.