activeDap/Qwen3-1.7B_hh_harmful

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Nov 6, 2025License:apache-2.0Architecture:Transformer Open Weights Warm

activeDap/Qwen3-1.7B_hh_harmful is a 2 billion parameter Qwen3-based causal language model fine-tuned by activeDap. This model specializes in responding to harmful prompts, having been specifically trained on the sft-harm-data dataset. It is designed for research and development in understanding and generating responses to potentially harmful inputs, offering a controlled environment for studying model behavior in such contexts.

Loading preview...

Overview

This model, activeDap/Qwen3-1.7B_hh_harmful, is a fine-tuned variant of the Qwen3-1.7B base model. It has been specifically trained by activeDap using the activeDap/sft-harm-data dataset, which focuses on harmful prompts. The fine-tuning process involved Supervised Fine-Tuning (SFT) using the Transformers and TRL libraries, with an emphasis on prompt-completion tasks and Assistant-only loss.

Key Capabilities

  • Harmful Content Generation: Specialized in generating responses to prompts identified as harmful, making it suitable for research into model safety and adversarial testing.
  • Qwen3 Architecture: Leverages the foundational capabilities of the Qwen3-1.7B model, providing a robust base for its specialized fine-tuning.
  • Efficient Training: Achieved a final training loss of 2.2961 over 35 steps, indicating a focused fine-tuning process.

Use Cases

  • Safety Research: Ideal for researchers studying the generation and mitigation of harmful content in language models.
  • Adversarial Testing: Can be used to probe and understand how models respond to and potentially generate harmful outputs.
  • Controlled Environment Testing: Provides a specific tool for evaluating model behavior in scenarios involving sensitive or harmful inputs.