PirxTion/qwen3-dpo-tulu
TEXT GENERATIONConcurrency Cost:1Model Size:0.8BQuant:BF16Ctx Length:32kPublished:May 24, 2025Architecture:Transformer Warm
PirxTion/qwen3-dpo-tulu is a compact 0.8 billion parameter language model, fine-tuned from unsloth/Qwen3-0.6B-Base using Direct Preference Optimization (DPO) with a substantial 40960 token context length. This model leverages the TRL framework to align its responses with human preferences, making it particularly suitable for generating high-quality, preference-aligned text. Its primary use case is in applications requiring nuanced and contextually rich text generation where user preferences are a key factor.
Loading preview...