UCSC-VLAA/STAR1-R1-Distill-8B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Apr 3, 2025License:apache-2.0Architecture:Transformer Open Weights Warm

UCSC-VLAA/STAR1-R1-Distill-8B is an 8 billion parameter Llama-based language model developed by UCSC-VLAA, fine-tuned on the STAR-1 dataset. This model is specifically designed to enhance safety alignment in large reasoning models while maintaining reasoning capabilities. It integrates and refines data from multiple sources, providing policy-grounded reasoning samples to improve safety performance across benchmarks. The model is optimized for applications requiring safer and more aligned reasoning outputs.

Loading preview...

Model Overview

UCSC-VLAA/STAR1-R1-Distill-8B is an 8 billion parameter language model, part of the STAR-1 project by UCSC-VLAA, focused on Safer Alignment of Reasoning LLMs. This model is a Llama-based variant, specifically fine-tuned using the high-quality STAR-1 dataset.

Key Capabilities

  • Enhanced Safety Alignment: The model is trained on the STAR-1 dataset, which comprises 1,000 carefully selected, policy-grounded reasoning examples, evaluated by GPT-4o for best safety practices.
  • Maintained Reasoning Performance: Fine-tuning with STAR-1 aims to significantly improve safety without substantially impacting the model's core reasoning abilities.
  • Distilled from Larger Models: This 8B parameter model is a distilled version, offering a more efficient option for deployment while retaining safety improvements.

Use Cases

This model is particularly well-suited for applications where:

  • Safety and alignment are critical: Ideal for scenarios requiring robust adherence to safety guidelines in AI-generated content.
  • Reasoning tasks require ethical considerations: Useful in environments where large reasoning models need to produce outputs that are not only logical but also safe and aligned with best practices.
  • Resource-efficient deployment: As an 8B parameter model, it offers a balance between performance and computational cost for safety-enhanced reasoning.