nlee-208/uf-mistral-it-sft-g0

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Jul 19, 2024License:llama3Architecture:Transformer Cold

The nlee-208/uf-mistral-it-sft-g0 model is an 8 billion parameter language model fine-tuned from Meta-Llama-3-8B-Instruct. It was trained on the nlee-208/uf-g0-sft dataset for one epoch with a learning rate of 5e-05 and a context length of 8192 tokens. This model is intended for general instruction-following tasks, leveraging its Llama-3 base architecture.

Loading preview...

Model Overview

The nlee-208/uf-mistral-it-sft-g0 is an 8 billion parameter instruction-tuned language model. It is a fine-tuned variant of the Meta-Llama-3-8B-Instruct base model, developed by nlee-208. The model was trained using the nlee-208/uf-g0-sft dataset.

Training Details

The fine-tuning process involved a single epoch of training with specific hyperparameters:

  • Learning Rate: 5e-05
  • Batch Sizes: train_batch_size of 8, eval_batch_size of 8
  • Gradient Accumulation: 8 steps, leading to a total_train_batch_size of 128
  • Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
  • Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio
  • Devices: Trained across 2 multi-GPU devices

Framework Versions

The training utilized:

  • Transformers 4.42.4
  • Pytorch 2.1.2.post303
  • Datasets 2.18.0
  • Tokenizers 0.19.1

Intended Use

Based on its instruction-tuned nature and Llama-3 lineage, this model is suitable for a variety of general-purpose instruction-following applications.