Model Overview
The valleriee/Qwen3-1.7B-student-refusal-badnet-logitkd is a 2 billion parameter language model built upon the Qwen3 architecture. This model is identified as a "student" model, suggesting its role in a knowledge distillation setup, likely learning from a larger "teacher" model.
Key Characteristics
- Architecture: Qwen3-based, indicating a transformer-decoder structure.
- Parameter Count: 2 billion parameters, making it a relatively compact model suitable for research and specific applications where efficiency is key.
- Specialization: The model's name explicitly points to its focus on "refusal" behavior and "badnet" detection, utilizing "logit knowledge distillation." This implies it's engineered to understand and potentially mitigate instances where models refuse to answer or exhibit vulnerabilities to adversarial attacks.
Intended Use Cases
This model is primarily suited for research and development in the following areas:
- Studying Model Refusal: Analyzing why and how language models refuse to answer certain prompts.
- Badnet Detection: Investigating and identifying vulnerabilities related to 'badnet' attacks, which aim to compromise model integrity.
- Knowledge Distillation Research: Exploring the effectiveness of logit knowledge distillation for transferring specific safety or robustness properties from larger models to smaller, more efficient ones.
- Safety and Alignment Research: Contributing to the broader field of AI safety by providing a specialized tool for understanding and addressing undesirable model behaviors.