Name: activeDap/Qwen3-1.7B_hh_harmful API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: activeDap

Overview

This model, activeDap/Qwen3-1.7B_hh_harmful, is a fine-tuned variant of the Qwen3-1.7B base model. It has been specifically trained by activeDap using the activeDap/sft-harm-data dataset, which focuses on harmful prompts. The fine-tuning process involved Supervised Fine-Tuning (SFT) using the Transformers and TRL libraries, with an emphasis on prompt-completion tasks and Assistant-only loss.

Key Capabilities

Harmful Content Generation: Specialized in generating responses to prompts identified as harmful, making it suitable for research into model safety and adversarial testing.
Qwen3 Architecture: Leverages the foundational capabilities of the Qwen3-1.7B model, providing a robust base for its specialized fine-tuning.
Efficient Training: Achieved a final training loss of 2.2961 over 35 steps, indicating a focused fine-tuning process.

Use Cases

Safety Research: Ideal for researchers studying the generation and mitigation of harmful content in language models.
Adversarial Testing: Can be used to probe and understand how models respond to and potentially generate harmful outputs.
Controlled Environment Testing: Provides a specific tool for evaluating model behavior in scenarios involving sensitive or harmful inputs.

Overview

Overview

Key Capabilities

Use Cases

Full Model Card (README)