CharlesLi/llama_2_rlhf_safe_4o_default_1000_full

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 13, 2025License:llama2Architecture:Transformer Open Weights Cold

The CharlesLi/llama_2_rlhf_safe_4o_default_1000_full is a 7 billion parameter Llama-2-7b-chat-hf model fine-tuned by CharlesLi. This model is specifically fine-tuned on a generator dataset, indicating an optimization for text generation tasks. It maintains the Llama 2 architecture with a 4096 token context length, focusing on safe and aligned outputs through RLHF.

Loading preview...

Model Overview

This model, llama_2_rlhf_safe_4o_default_1000_full, is a fine-tuned variant of the meta-llama/Llama-2-7b-chat-hf base model, developed by CharlesLi. It features 7 billion parameters and a context length of 4096 tokens. The fine-tuning process specifically utilized a generator dataset, suggesting an emphasis on improving text generation capabilities and safety through Reinforcement Learning from Human Feedback (RLHF).

Key Training Details

The model was trained with a learning rate of 2e-05 over 1 epoch, using an Adam optimizer. The training involved a total batch size of 32 across 4 GPUs, with a cosine learning rate scheduler and a warmup ratio of 0.1. During evaluation, it achieved a loss of 1.3804.

Intended Use Cases

Given its fine-tuning on a generator dataset and the application of RLHF for safety, this model is likely suitable for applications requiring controlled and safe text generation. Potential uses include chatbots, content creation, and other generative AI tasks where adherence to safety guidelines is crucial.