Name: AIPlans/TinyLlama-1.1B-IPO-PKU-SafeRLHF API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: AIPlans

Model Overview

AIPlans/TinyLlama-1.1B-IPO-PKU-SafeRLHF is a compact 1.1 billion parameter language model, derived from the TinyLlama/TinyLlama-1.1B-Chat-v1.0 base. This iteration has been fine-tuned using a process that includes 'IPO-PKU-SafeRLHF', which typically implies an application of Reinforcement Learning from Human Feedback (RLHF) with a focus on safety and alignment. While specific details about the fine-tuning dataset are not provided, the process aims to enhance the model's performance and safety characteristics.

Training Details

The model was trained with a learning rate of 5e-06 over 1 epoch, utilizing a batch size of 4 and a total train batch size of 16 with gradient accumulation. The optimizer used was Adam, and the learning rate scheduler was set to cosine with a warmup ratio of 0.1. Evaluation metrics during training show improvements in rewards and accuracies, indicating the fine-tuning process was effective in its objectives.

Key Characteristics

Compact Size: At 1.1 billion parameters, it offers a lightweight solution for various NLP tasks.
RLHF Fine-tuning: The 'SafeRLHF' component suggests an emphasis on generating safer and more aligned responses.
Base Model: Built upon TinyLlama/TinyLlama-1.1B-Chat-v1.0, inheriting its conversational capabilities.

Potential Use Cases

This model is suitable for applications where a smaller, efficient language model with enhanced safety considerations is beneficial. It can be used for:

Conversational AI: Building chatbots or virtual assistants where resource efficiency is key.
Content Generation: Generating text that adheres to safety guidelines.
Research: Exploring the impact of RLHF on smaller language models.

Overview

Model Overview

Training Details

Key Characteristics

Potential Use Cases

Full Model Card (README)