tsavage68/chat_350STEPS_1e5_SFT

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Feb 13, 2024Architecture:Transformer Cold

The tsavage68/chat_350STEPS_1e5_SFT is a 7 billion parameter language model, fine-tuned from Meta's Llama-2-7b-chat-hf. This model was trained for 350 steps with a learning rate of 0.0001, demonstrating a final validation loss of 0.3260. It is a foundational chat model, suitable for general conversational AI tasks.

Loading preview...

Model Overview

The tsavage68/chat_350STEPS_1e5_SFT is a 7 billion parameter language model, derived from the meta-llama/Llama-2-7b-chat-hf architecture. This model has undergone a specific fine-tuning process, indicated by its "SFT" (Supervised Fine-Tuning) designation.

Training Details

The model was trained using the following key hyperparameters:

  • Base Model: Llama-2-7b-chat-hf
  • Learning Rate: 0.0001
  • Training Steps: 350
  • Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
  • Batch Size: 4 (train), 1 (eval) with 2 gradient accumulation steps, resulting in a total train batch size of 8.
  • LR Scheduler: Cosine type with 100 warmup steps.

During training, the model achieved a final validation loss of 0.3260, with the training loss progressively decreasing over 350 steps. The training utilized Transformers 4.37.2, Pytorch 2.0.0+cu117, Datasets 2.17.0, and Tokenizers 0.15.2.

Intended Use

As a fine-tuned chat model, it is generally suitable for conversational AI applications. Further details on specific use cases and limitations would require additional information about the dataset used for fine-tuning.