UCSC-VLAA/STAR1-R1-Distill-7B
UCSC-VLAA/STAR1-R1-Distill-7B is a 7.6 billion parameter language model developed by UCSC-VLAA, fine-tuned on the STAR-1 dataset to enhance safety alignment in large reasoning models. This model integrates and refines data from multiple sources, providing policy-grounded reasoning samples. It is specifically designed to improve safety performance across various benchmarks while maintaining reasoning capabilities. The model is based on a Qwen-7B architecture, optimized for safer AI applications.
Loading preview...
Overview
UCSC-VLAA/STAR1-R1-Distill-7B is a 7.6 billion parameter model developed by UCSC-VLAA, specifically fine-tuned using the STAR-1 dataset to improve safety alignment in reasoning-focused large language models. The STAR-1 dataset, comprising 1,000 carefully selected examples, emphasizes diversity, deliberative reasoning, and rigorous filtering, with each example evaluated by GPT-4o for alignment with best safety practices. This model is part of a series of STAR-1 fine-tuned models, including variants based on Qwen and Llama architectures.
Key Capabilities
- Enhanced Safety Alignment: Significantly improves safety performance on various benchmarks.
- Reasoning Preservation: Achieves safety improvements with minimal impact on core reasoning capabilities.
- Policy-Grounded Responses: Trained on data designed to provide responses aligned with established safety policies.
Good For
- Applications requiring safer AI outputs in reasoning tasks.
- Developers looking for models with improved ethical alignment.
- Use cases where mitigating harmful or biased responses is critical.