Name: AIPlans/tinyllama-1.1b-dpo-pku-saferlhf_2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: AIPlans

Model Overview

AIPlans/tinyllama-1.1b-dpo-pku-saferlhf_2 is a 1.1 billion parameter language model, building upon the base of TinyLlama/TinyLlama-1.1B-Chat-v1.0. This iteration has been fine-tuned using Direct Preference Optimization (DPO) with an emphasis on safety, likely incorporating principles from PKU-SaferLHF methodologies, although the specific dataset used for this fine-tuning is not detailed.

Training Details

The model was trained for 1.0 epoch with a learning rate of 5e-06 and a total batch size of 16 (achieved with train_batch_size=4 and gradient_accumulation_steps=4). The optimizer used was Adam with standard betas and epsilon, and a cosine learning rate scheduler with a 0.1 warmup ratio. Evaluation metrics during training show improvements in rewards/accuracies, reaching 0.8000, and a final validation loss of 0.4486.

Key Characteristics

Compact Size: At 1.1 billion parameters, it offers a lightweight solution for deployment.
DPO Fine-tuning: Leverages Direct Preference Optimization for improved alignment and response quality.
Safety Focus: Incorporates techniques aimed at enhancing safety, indicated by the "saferlhf" in its name.

Potential Use Cases

This model is suitable for applications requiring a small, efficient language model with enhanced safety characteristics, such as:

Lightweight chatbots or conversational agents.
Content generation where safety and alignment are priorities.
Edge device deployment or resource-constrained environments.

Overview

Model Overview

Training Details

Key Characteristics

Potential Use Cases

Full Model Card (README)