nongfuyulang/engineer-heavy-500k-barc-llama3.1-8b-ins-fft-induction_lr1e-5_epoch3

TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kPublished:Nov 19, 2024License:llama3.1Architecture:Transformer Cold

The nongfuyulang/engineer-heavy-500k-barc-llama3.1-8b-ins-fft-induction_lr1e-5_epoch3 model is a fine-tuned variant of Meta-Llama-3.1-8B-Instruct, developed by nongfuyulang. This 8 billion parameter instruction-tuned model was trained for 2 epochs with a learning rate of 1e-05. It is optimized for tasks related to its unspecified training dataset, achieving a validation loss of 0.2710.

Loading preview...

Model Overview

This model, engineer-heavy-500k-barc-llama3.1-8b-ins-fft-induction_lr1e-5_epoch3, is a fine-tuned version of the Meta-Llama-3.1-8B-Instruct base model. It was developed by nongfuyulang and underwent a specific training regimen to adapt it for particular applications.

Training Details

The model was fine-tuned over 2 epochs using a learning rate of 1e-05. Key training hyperparameters included:

  • Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
  • Batch Size: 16 (train and eval), with a total distributed batch size of 128 across 8 GPUs
  • LR Scheduler: Cosine type with a warmup ratio of 0.1

Performance

During training, the model achieved a final validation loss of 0.2710. The training loss decreased from 0.2797 in the first epoch to 0.2389 in the second epoch.

Intended Use

Specific intended uses and limitations are not detailed in the provided information, suggesting further evaluation or documentation is needed to determine optimal applications.