CharlesLi/llama_2_sky_safe_o1_4o_reflect_1000_100_full

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 13, 2025License:llama2Architecture:Transformer Open Weights Cold

The CharlesLi/llama_2_sky_safe_o1_4o_reflect_1000_100_full is a 7 billion parameter Llama-2-7b-chat-hf model, fine-tuned by CharlesLi. This model is based on the Llama 2 architecture and has a context length of 4096 tokens. It was fine-tuned on a specific generator dataset, achieving a loss of 0.7435 on the evaluation set. Its primary application is likely within conversational or generative tasks, leveraging its Llama 2 foundation.

Loading preview...

Model Overview

This model, llama_2_sky_safe_o1_4o_reflect_1000_100_full, is a fine-tuned variant of the meta-llama/Llama-2-7b-chat-hf base model. Developed by CharlesLi, it leverages the robust Llama 2 architecture with 7 billion parameters and a 4096-token context length.

Training Details

The model was fine-tuned using a specific "generator dataset." During training, it utilized a learning rate of 2e-05, a train_batch_size of 4, and a gradient_accumulation_steps of 2, resulting in a total_train_batch_size of 32. The training process involved 1 epoch, using an Adam optimizer with cosine learning rate scheduling and a warmup ratio of 0.1. On the evaluation set, the model achieved a loss of 0.7435.

Frameworks Used

  • Transformers 4.44.2
  • Pytorch 2.4.1+cu121
  • Datasets 3.0.0
  • Tokenizers 0.19.1

Potential Use Cases

Given its foundation on Llama-2-7b-chat-hf and fine-tuning on a generator dataset, this model is likely suitable for:

  • Conversational AI applications
  • Text generation tasks
  • Further research and experimentation based on the Llama 2 chat model.