Nike-Hanmatheekuna/llama3-8b-sft-full
The Nike-Hanmatheekuna/llama3-8b-sft-full model is an 8 billion parameter language model fine-tuned from Meta-Llama-3-8B. It was trained with a learning rate of 2e-05 over 3 epochs, utilizing a cosine learning rate scheduler. This model is a specialized version of the Llama 3 architecture, intended for general language tasks following its supervised fine-tuning.
Loading preview...
Model Overview
Nike-Hanmatheekuna/llama3-8b-sft-full is an 8 billion parameter language model derived from the Meta-Llama-3-8B architecture. This version has undergone supervised fine-tuning (SFT) to adapt its capabilities, though the specific dataset used for this fine-tuning is not detailed.
Training Details
The model was trained using the following key hyperparameters:
- Learning Rate: 2e-05
- Batch Size: 2 (train and eval), with a total effective batch size of 64 due to gradient accumulation steps (32).
- Optimizer: Adam with standard betas and epsilon.
- Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio.
- Epochs: 3
Intended Use
Given its foundation in the Llama 3 series and subsequent fine-tuning, this model is generally suitable for a range of natural language processing tasks. However, without specific information on the fine-tuning dataset, its optimal use cases and limitations are not fully defined. Users should evaluate its performance for their specific applications.