Nike-Hanmatheekuna/llama3-8b-sft-full

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:May 16, 2024License:llama3Architecture:Transformer Cold

The Nike-Hanmatheekuna/llama3-8b-sft-full model is an 8 billion parameter language model fine-tuned from Meta-Llama-3-8B. It was trained with a learning rate of 2e-05 over 3 epochs, utilizing a cosine learning rate scheduler. This model is a specialized version of the Llama 3 architecture, intended for general language tasks following its supervised fine-tuning.

Loading preview...

Model Overview

Nike-Hanmatheekuna/llama3-8b-sft-full is an 8 billion parameter language model derived from the Meta-Llama-3-8B architecture. This version has undergone supervised fine-tuning (SFT) to adapt its capabilities, though the specific dataset used for this fine-tuning is not detailed.

Training Details

The model was trained using the following key hyperparameters:

  • Learning Rate: 2e-05
  • Batch Size: 2 (train and eval), with a total effective batch size of 64 due to gradient accumulation steps (32).
  • Optimizer: Adam with standard betas and epsilon.
  • Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio.
  • Epochs: 3

Intended Use

Given its foundation in the Llama 3 series and subsequent fine-tuning, this model is generally suitable for a range of natural language processing tasks. However, without specific information on the fine-tuning dataset, its optimal use cases and limitations are not fully defined. Users should evaluate its performance for their specific applications.