CharlesLi/llama_2_cot_simplest_alpaca_4_3_epoch_full

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 21, 2025License:llama2Architecture:Transformer Open Weights Cold

CharlesLi/llama_2_cot_simplest_alpaca_4_3_epoch_full is a 7 billion parameter language model fine-tuned from Meta's Llama-2-7b-chat-hf. This model was trained for 3 epochs on a generator dataset, achieving a validation loss of 1.0590. It is intended for general conversational tasks, leveraging the Llama 2 architecture with a 4096 token context length.

Loading preview...

Model Overview

This model, llama_2_cot_simplest_alpaca_4_3_epoch_full, is a fine-tuned variant of the Meta Llama-2-7b-chat-hf base model. It has 7 billion parameters and was trained for 3 epochs on a specific generator dataset. The training process utilized a learning rate of 2e-05, a batch size of 32 (with gradient accumulation), and an Adam optimizer.

Training Details

The model was trained using a multi-GPU setup (4 devices) with a cosine learning rate scheduler and a warmup ratio of 0.1. During training, the validation loss reached 1.0590. The training was conducted using Transformers 4.44.2, Pytorch 2.4.1+cu121, Datasets 3.0.0, and Tokenizers 0.19.1.

Key Characteristics

  • Base Model: Fine-tuned from meta-llama/Llama-2-7b-chat-hf.
  • Parameter Count: 7 billion parameters.
  • Training Epochs: 3 epochs on a generator dataset.
  • Validation Loss: Achieved 1.0590 on the evaluation set.

Intended Use Cases

Given its fine-tuning on a generator dataset, this model is likely suitable for tasks requiring text generation and conversational AI, building upon the robust capabilities of the Llama 2 architecture.