CharlesLi/llama_2_rlhf_safe_4o_reflect_500_full

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 13, 2025License:llama2Architecture:Transformer Open Weights Cold

The CharlesLi/llama_2_rlhf_safe_4o_reflect_500_full model is a 7 billion parameter Llama 2-based causal language model, fine-tuned from Meta's Llama-2-7b-chat-hf. This model has undergone additional fine-tuning on a generator dataset, achieving a loss of 1.2095 on its evaluation set. It is designed for conversational AI applications, leveraging its Llama 2 foundation with specific RLHF safety and reflection optimizations.

Loading preview...

Model Overview

This model, llama_2_rlhf_safe_4o_reflect_500_full, is a fine-tuned variant of the Meta Llama-2-7b-chat-hf base model. It features 7 billion parameters and has been specifically adapted through additional training on a generator dataset. The fine-tuning process aimed to enhance its capabilities, as indicated by an evaluation loss of 1.2095.

Key Training Details

The model was trained using the following hyperparameters:

  • Learning Rate: 2e-05
  • Batch Sizes: train_batch_size of 4, eval_batch_size of 4
  • Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
  • Scheduler: Cosine LR scheduler with a warmup ratio of 0.1
  • Epochs: 1
  • Distributed Training: Multi-GPU setup with 4 devices and 2 gradient accumulation steps, resulting in a total train batch size of 32.

Intended Use

While specific intended uses and limitations require further information, its foundation on Llama-2-7b-chat-hf suggests suitability for chat-based applications and conversational AI tasks, potentially with enhanced safety and reflection characteristics due to its RLHF fine-tuning.