CharlesLi/llama_2_sky_safe_o1_llama_3_70B_default_4000_100_full

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 13, 2025License:llama2Architecture:Transformer Open Weights Cold

The CharlesLi/llama_2_sky_safe_o1_llama_3_70B_default_4000_100_full model is a 7 billion parameter language model, fine-tuned from Meta's Llama-2-7b-chat-hf. It is optimized for generative tasks, achieving a validation loss of 0.6551 on its evaluation set. This model is suitable for applications requiring a Llama-2-based generative AI with a 4096-token context window.

Loading preview...

Model Overview

This model, llama_2_sky_safe_o1_llama_3_70B_default_4000_100_full, is a fine-tuned variant of the Meta Llama-2-7b-chat-hf architecture. It has been specifically trained on a generator dataset, demonstrating a validation loss of 0.6551.

Training Details

The model was trained using the following key hyperparameters:

  • Learning Rate: 2e-05
  • Batch Size: 4 (train and eval), with a total effective training batch size of 32 due to gradient accumulation.
  • Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08.
  • Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio.
  • Epochs: 1

Training results show a decreasing loss over steps, with a final validation loss of 0.6551. The training utilized a multi-GPU setup with 4 devices.

Framework Versions

The model was developed using:

  • Transformers 4.44.2
  • Pytorch 2.4.1+cu121
  • Datasets 3.0.0
  • Tokenizers 0.19.1

Intended Use

While specific intended uses and limitations require more information, as per the original model card, its fine-tuning on a generator dataset suggests suitability for various generative AI tasks based on the Llama-2 architecture.