Name: Genie2k/qwen3-0.6b-dpo API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Genie2k

Genie2k/qwen3-0.6b-dpo: DPO LoRA Adapter for Qwen3-0.6B

This model is a Direct Preference Optimization (DPO) LoRA adapter built upon the Qwen/Qwen3-0.6B base model, developed by Agha Salik Ali and Uzair Nadeem. It was trained on an existing Supervised Fine-Tuned (SFT) adapter to further align the model's output with human conversational preferences and improve semantic quality.

Key Capabilities

Enhanced Conversational Naturalness: Improves the model's ability to generate human-like responses.
Semantic Quality Improvement: Focuses on refining the meaning and coherence of generated text.
Format Stripping: Effective at removing rigid, robotic formatting (e.g., overly structured math explanations) in favor of natural prose.

Training Details

The adapter was fine-tuned using the DPOTrainer from the trl library on the helpful-base split of the Anthropic/hh-rlhf dataset. The training involved fp16 mixed precision over 1 epoch on Kaggle Dual NVIDIA T4 GPUs. Evaluation metrics included BLEU (6.9213) and BERTScore F1 (0.8608).

Limitations

Due to the small 0.6B parameter base and heavy alignment training, the model is susceptible to "format collapse" on open-ended creative tasks, potentially leading to repetitive loops. DPO training does not inject new factual knowledge, so the model may still hallucinate if the base model lacks domain understanding.

Overview

Genie2k/qwen3-0.6b-dpo: DPO LoRA Adapter for Qwen3-0.6B

Key Capabilities

Training Details

Limitations

Full Model Card (README)