ogx786/tinyllama-sft-dpo-finetuned
The ogx786/tinyllama-sft-dpo-finetuned model is a 1.1 billion parameter TinyLlama-based causal language model, fine-tuned through a two-stage SFT and DPO pipeline. Developed by ogx786, this model builds upon TinyLlama/TinyLlama-1.1B-Chat-v1.0 with a 2048 token context length. It is optimized for improved conversational quality and alignment, demonstrating enhanced BLEU and BERTScore F1 metrics over its base model.
Loading preview...
Model Overview
This model, ogx786/tinyllama-sft-dpo-finetuned, is a 1.1 billion parameter language model derived from TinyLlama/TinyLlama-1.1B-Chat-v1.0. It has undergone a two-stage LoRA fine-tuning process to enhance its conversational capabilities and alignment.
Training Pipeline
The fine-tuning involved two sequential stages:
- Stage 1: Supervised Fine-Tuning (SFT): LoRA fine-tuning was performed on 3,000 samples from the
Open-Orca/OpenOrcadataset. The best trial,sft_5, utilized a rank of 8, alpha of 16, learning rate of 1e-4, and 3 epochs. - Stage 2: Direct Preference Optimization (DPO): The best SFT adapter (
sft_5) served as the starting point for DPO, trained on 2,000 samples from theHuggingFaceH4/ultrafeedback_binarizeddataset. The optimal trial,dpo_4, used a beta of 0.3, learning rate of 5e-6, and 2 epochs.
Evaluation Results
The fine-tuning process led to improvements in generation quality:
- Base TinyLlama-1.1B: Avg. BLEU 0.0387, Avg. BERTScore F1 0.8714
- Best SFT (
sft_5): Avg. BLEU 0.0451, Avg. BERTScore F1 0.8811 - Best DPO (
dpo_4): Avg. BLEU 0.0492, Avg. BERTScore F1 0.8757
Key Capabilities
- Enhanced Conversational Quality: The SFT and DPO stages aim to improve the model's ability to generate coherent and aligned responses in chat-like interactions.
- Efficient Fine-tuning: Utilizes LoRA for efficient adaptation of the base TinyLlama model.
Good for
- Applications requiring a small, efficient language model for chat and conversational tasks.
- Scenarios where improved alignment and response quality are desired over the base TinyLlama model.