ZhangShenao/baseline-Llama-3-8B-Instruct-sft

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kLicense:llama3Architecture:Transformer Cold

The ZhangShenao/baseline-Llama-3-8B-Instruct-sft is an 8 billion parameter Llama 3 instruction-tuned model, fine-tuned from Meta-Llama-3-8B-Instruct. This model is specifically fine-tuned on a generator dataset, making it suitable for tasks requiring text generation. It features a context length of 8192 tokens, optimized for generative applications.

Loading preview...

Model Overview

This model, baseline-Llama-3-8B-Instruct-sft, is an 8 billion parameter language model developed by ZhangShenao. It is a fine-tuned version of the meta-llama/Meta-Llama-3-8B-Instruct base model, specifically adapted through supervised fine-tuning (SFT).

Key Characteristics

  • Base Model: Meta-Llama-3-8B-Instruct
  • Parameter Count: 8 billion parameters
  • Context Length: 8192 tokens
  • Fine-tuning: Supervised fine-tuning (SFT) on a generator dataset.

Training Details

The model was trained using the following key hyperparameters:

  • Learning Rate: 2e-05
  • Batch Size: A total training batch size of 128 (train_batch_size: 4, gradient_accumulation_steps: 4, num_devices: 8)
  • Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
  • Epochs: 3
  • Scheduler: Cosine learning rate scheduler with a warmup ratio of 0.1

Intended Use

Given its fine-tuning on a generator dataset, this model is primarily intended for text generation tasks. Specific use cases would depend on the nature of the generator dataset used for SFT, which is not detailed in the provided information.