Name: ogx786/tinyllama-sft-dpo-finetuned API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: ogx786

Model Overview

This model, ogx786/tinyllama-sft-dpo-finetuned, is a 1.1 billion parameter language model derived from TinyLlama/TinyLlama-1.1B-Chat-v1.0. It has undergone a two-stage LoRA fine-tuning process to enhance its conversational capabilities and alignment.

Training Pipeline

The fine-tuning involved two sequential stages:

Stage 1: Supervised Fine-Tuning (SFT): LoRA fine-tuning was performed on 3,000 samples from the Open-Orca/OpenOrca dataset. The best trial, sft_5, utilized a rank of 8, alpha of 16, learning rate of 1e-4, and 3 epochs.
Stage 2: Direct Preference Optimization (DPO): The best SFT adapter (sft_5) served as the starting point for DPO, trained on 2,000 samples from the HuggingFaceH4/ultrafeedback_binarized dataset. The optimal trial, dpo_4, used a beta of 0.3, learning rate of 5e-6, and 2 epochs.

Evaluation Results

The fine-tuning process led to improvements in generation quality:

Base TinyLlama-1.1B: Avg. BLEU 0.0387, Avg. BERTScore F1 0.8714
Best SFT (sft_5): Avg. BLEU 0.0451, Avg. BERTScore F1 0.8811
Best DPO (dpo_4): Avg. BLEU 0.0492, Avg. BERTScore F1 0.8757

Key Capabilities

Enhanced Conversational Quality: The SFT and DPO stages aim to improve the model's ability to generate coherent and aligned responses in chat-like interactions.
Efficient Fine-tuning: Utilizes LoRA for efficient adaptation of the base TinyLlama model.

Good for

Applications requiring a small, efficient language model for chat and conversational tasks.
Scenarios where improved alignment and response quality are desired over the base TinyLlama model.

Overview

Model Overview

Training Pipeline

Evaluation Results

Key Capabilities

Good for

Full Model Card (README)