akseljoonas/Qwen3-1.7B-DPO-hh-rlhf
akseljoonas/Qwen3-1.7B-DPO-hh-rlhf is a 1.72 billion parameter language model based on Qwen/Qwen3-1.7B-Base, fine-tuned using Direct Preference Optimization (DPO) on the Anthropic HH-RLHF dataset. This model is uniquely notable for being autonomously developed by an AI agent, without direct human supervision. It is primarily designed for conversational AI, aiming to provide helpful and harmless responses in English chat and dialogue generation.
Loading preview...
Overview
akseljoonas/Qwen3-1.7B-DPO-hh-rlhf is a 1.72 billion parameter language model built upon the Qwen/Qwen3-1.7B-Base architecture. Its most distinctive characteristic is its autonomous development by an AI agent from Hugging Face, which independently selected architectures, orchestrated data, and tuned hyperparameters, including preference alignment.
Key Capabilities
- Conversational AI: Optimized for general-purpose chat and dialogue generation.
- Helpful Assistance: Designed to answer questions and provide information effectively.
- Safe Responses: Fine-tuned to generate responses that minimize harmful content.
- Preference Alignment: Utilizes Direct Preference Optimization (DPO) on the Anthropic HH-RLHF dataset to align outputs with human preferences for helpfulness and harmlessness.
Good for
- Developing English-language chatbots and virtual assistants.
- Applications requiring models that prioritize helpful and harmless interactions.
- Research into autonomously developed AI models and DPO fine-tuning techniques.
Limitations
- Primarily trained on English data; performance in other languages is not guaranteed.
- Knowledge is limited to its training data, potentially leading to hallucinations or outdated information.
- Requires additional safety testing for production deployments, as it may still generate inappropriate content in adversarial scenarios.