Name: tsavage68/chat_350STEPS_1e5_SFT API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: tsavage68

Model Overview

The tsavage68/chat_350STEPS_1e5_SFT is a 7 billion parameter language model, derived from the meta-llama/Llama-2-7b-chat-hf architecture. This model has undergone a specific fine-tuning process, indicated by its "SFT" (Supervised Fine-Tuning) designation.

Training Details

The model was trained using the following key hyperparameters:

Base Model: Llama-2-7b-chat-hf
Learning Rate: 0.0001
Training Steps: 350
Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
Batch Size: 4 (train), 1 (eval) with 2 gradient accumulation steps, resulting in a total train batch size of 8.
LR Scheduler: Cosine type with 100 warmup steps.

During training, the model achieved a final validation loss of 0.3260, with the training loss progressively decreasing over 350 steps. The training utilized Transformers 4.37.2, Pytorch 2.0.0+cu117, Datasets 2.17.0, and Tokenizers 0.15.2.

Intended Use

As a fine-tuned chat model, it is generally suitable for conversational AI applications. Further details on specific use cases and limitations would require additional information about the dataset used for fine-tuning.

Overview

Model Overview

Training Details

Intended Use

Full Model Card (README)