Name: AIPlans/tinyllama-1.1b-dpo-pku-saferlhf API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: AIPlans

Overview

This model, AIPlans/tinyllama-1.1b-dpo-pku-saferlhf, is a compact 1.1 billion parameter language model. It is a fine-tuned variant of the TinyLlama/TinyLlama-1.1B-Chat-v1.0 base model, indicating an optimization for chat-based interactions. The fine-tuning process involved Direct Preference Optimization (DPO), a method used to align models with human preferences without requiring a separate reward model.

Training Details

The model was trained for 1.0 epoch with a learning rate of 5e-07 and a total batch size of 16 (achieved with a train_batch_size of 4 and gradient_accumulation_steps of 4). The optimizer used was Adam with standard betas and epsilon, and a cosine learning rate scheduler with a 0.1 warmup ratio. Evaluation metrics during training show a final loss of 0.6742, with Rewards/chosen at 0.0508 and Rewards/rejected at 0.0098, suggesting a preference for chosen responses.

Potential Use Cases

Given its small size and DPO fine-tuning, this model is suitable for resource-constrained environments where a capable, preference-aligned conversational agent is needed. Its compact nature makes it efficient for deployment on edge devices or applications requiring low latency and memory footprint, particularly for general chat or instruction-following tasks where safety alignment is a consideration.

Overview

Overview

Training Details

Potential Use Cases

Full Model Card (README)