nlee-208/uf-mistral-it-sft-g0
The nlee-208/uf-mistral-it-sft-g0 model is an 8 billion parameter language model fine-tuned from Meta-Llama-3-8B-Instruct. It was trained on the nlee-208/uf-g0-sft dataset for one epoch with a learning rate of 5e-05 and a context length of 8192 tokens. This model is intended for general instruction-following tasks, leveraging its Llama-3 base architecture.
Loading preview...
Model Overview
The nlee-208/uf-mistral-it-sft-g0 is an 8 billion parameter instruction-tuned language model. It is a fine-tuned variant of the Meta-Llama-3-8B-Instruct base model, developed by nlee-208. The model was trained using the nlee-208/uf-g0-sft dataset.
Training Details
The fine-tuning process involved a single epoch of training with specific hyperparameters:
- Learning Rate: 5e-05
- Batch Sizes:
train_batch_sizeof 8,eval_batch_sizeof 8 - Gradient Accumulation: 8 steps, leading to a
total_train_batch_sizeof 128 - Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
- Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio
- Devices: Trained across 2 multi-GPU devices
Framework Versions
The training utilized:
- Transformers 4.42.4
- Pytorch 2.1.2.post303
- Datasets 2.18.0
- Tokenizers 0.19.1
Intended Use
Based on its instruction-tuned nature and Llama-3 lineage, this model is suitable for a variety of general-purpose instruction-following applications.