Hahmdong/PERSONA-qwen3-4b-quirky
Hahmdong/PERSONA-qwen3-4b-quirky is a 4 billion parameter language model fine-tuned from Qwen/Qwen3-4B, featuring a 32768 token context length. This model was trained using Direct Preference Optimization (DPO) to enhance its conversational and response generation capabilities. It is designed for applications requiring nuanced and quirky text generation, building upon the robust Qwen3 architecture.
Loading preview...
Model Overview
Hahmdong/PERSONA-qwen3-4b-quirky is a 4 billion parameter language model, fine-tuned from the base Qwen/Qwen3-4B architecture. This model leverages a substantial 32768 token context window, allowing for processing and generating longer, more coherent texts. Its development focused on enhancing conversational quality and generating distinctive responses.
Key Capabilities
- Fine-tuned for Persona-based Generation: Specifically trained to produce text with a quirky persona, making it suitable for creative applications.
- Direct Preference Optimization (DPO): Utilizes the DPO method, as detailed in "Direct Preference Optimization: Your Language Model is Secretly a Reward Model," to align model outputs with human preferences, leading to more desirable and engaging responses.
- Robust Base Model: Built upon the Qwen3-4B foundation, inheriting its strong language understanding and generation abilities.
Training Details
The model was trained using the TRL (Transformer Reinforcement Learning) framework, specifically employing the DPO algorithm. This approach helps in refining the model's output based on preference data, optimizing for quality and style. The training utilized TRL version 0.27.1, Transformers 4.57.6, Pytorch 2.9.0, Datasets 4.0.0, and Tokenizers 0.22.2.
Good For
- Generating creative and unique text with a distinct personality.
- Conversational AI where a quirky or specific persona is desired.
- Applications requiring a model that has been optimized for human preferences through DPO.