CharlesLi/llama_2_cot_simplest_alpaca_3_3_epoch_full

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 21, 2025License:llama2Architecture:Transformer Open Weights Cold

The CharlesLi/llama_2_cot_simplest_alpaca_3_3_epoch_full is a 7 billion parameter language model fine-tuned from Meta's Llama-2-7b-chat-hf. This model was trained for 3 epochs on a generator dataset, achieving a loss of 0.9498 on the evaluation set. It is optimized for tasks related to its specific fine-tuning data, making it suitable for applications requiring responses aligned with the generator dataset's characteristics.

Loading preview...

Model Overview

This model, llama_2_cot_simplest_alpaca_3_3_epoch_full, is a 7 billion parameter language model derived from meta-llama/Llama-2-7b-chat-hf. It has undergone fine-tuning on a specific "generator dataset" over 3 epochs, demonstrating an evaluation loss of 0.9498.

Training Details

The fine-tuning process utilized the following key hyperparameters:

  • Learning Rate: 2e-05
  • Batch Size: 4 (train), 4 (eval)
  • Gradient Accumulation Steps: 2, resulting in a total train batch size of 32
  • Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
  • Scheduler: Cosine with a warmup ratio of 0.1
  • Epochs: 3

Intended Use

Given its fine-tuning on a "generator dataset," this model is best suited for tasks that align with the characteristics and patterns present in that specific training data. Users should consider its specialized training for applications where its fine-tuned responses are beneficial.