Model Overview

This model, named ref_teacher, is an 8 billion parameter language model derived from Meta's Llama-3.1-8B base architecture. It has been fine-tuned using the HuggingFaceH4/deita-10k-v0-sft dataset, aiming to enhance its performance on various language tasks.

Training Details

The training process involved a learning rate of 2e-05, a train_batch_size of 4, and a gradient_accumulation_steps of 4, resulting in an effective total batch size of 64. The model was trained for 3 epochs using the AdamW optimizer with a cosine learning rate scheduler and a warmup ratio of 0.1. During training, the validation loss decreased from 0.9567 in the first epoch to 1.0297 by the third epoch.

Key Characteristics

Base Model: Meta Llama-3.1-8B
Fine-tuning Dataset: HuggingFaceH4/deita-10k-v0-sft
Parameter Count: 8 billion
Context Length: 32768 tokens

Potential Use Cases

Given its foundation on Llama-3.1-8B and fine-tuning on a general instruction dataset, this model is suitable for a range of natural language processing applications, including text generation, summarization, and question answering, where a robust general-purpose language model is required.

Overview

Model Overview

Training Details

Key Characteristics

Potential Use Cases

Full Model Card (README)