Name: CharlesLi/llama_2_rlhf_safe_llama_3_8B_reflect_1000_full API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: CharlesLi

Model Overview

This model, llama_2_rlhf_safe_llama_3_8B_reflect_1000_full, is a fine-tuned variant of the meta-llama/Llama-2-7b-chat-hf architecture, developed by CharlesLi. It leverages a 7 billion parameter Llama 2 base and has undergone specific fine-tuning on a 'generator dataset' with a focus on Reinforcement Learning from Human Feedback (RLHF) principles, particularly incorporating a 'reflection' component. The training process utilized a learning rate of 2e-05, a total batch size of 32, and ran for 1 epoch.

Key Training Details

Base Model: meta-llama/Llama-2-7b-chat-hf
Parameters: 7 billion
Training Objective: Enhanced safety and alignment via RLHF with reflection.
Loss Achieved: 0.7626 on the evaluation set.
Hyperparameters: Adam optimizer, cosine learning rate scheduler with 0.1 warmup ratio.

Intended Use Cases

This model is suitable for applications requiring a Llama 2-based language model with an emphasis on improved safety and alignment, stemming from its RLHF and reflection-based fine-tuning. It can be applied to various generative tasks where controlled and safer outputs are desired.

Overview

Model Overview

Key Training Details

Intended Use Cases

Full Model Card (README)