Name: akseljoonas/Qwen3-1.7B-DPO-hh-rlhf API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: akseljoonas

Overview

akseljoonas/Qwen3-1.7B-DPO-hh-rlhf is a 1.72 billion parameter language model built upon the Qwen/Qwen3-1.7B-Base architecture. Its most distinctive characteristic is its autonomous development by an AI agent from Hugging Face, which independently selected architectures, orchestrated data, and tuned hyperparameters, including preference alignment.

Key Capabilities

Conversational AI: Optimized for general-purpose chat and dialogue generation.
Helpful Assistance: Designed to answer questions and provide information effectively.
Safe Responses: Fine-tuned to generate responses that minimize harmful content.
Preference Alignment: Utilizes Direct Preference Optimization (DPO) on the Anthropic HH-RLHF dataset to align outputs with human preferences for helpfulness and harmlessness.

Good for

Developing English-language chatbots and virtual assistants.
Applications requiring models that prioritize helpful and harmless interactions.
Research into autonomously developed AI models and DPO fine-tuning techniques.

Limitations

Primarily trained on English data; performance in other languages is not guaranteed.
Knowledge is limited to its training data, potentially leading to hallucinations or outdated information.
Requires additional safety testing for production deployments, as it may still generate inappropriate content in adversarial scenarios.

Overview

Overview

Key Capabilities

Good for

Limitations

Full Model Card (README)