Elfsong/Llama-3.1-8B-Instruct-sft

Cold
Public
8B
FP8
32768
Hugging Face
Overview

Overview

Elfsong/Llama-3.1-8B-Instruct-sft is an instruction-tuned language model built upon the robust Meta-Llama-3.1-8B-Instruct architecture. This 8 billion parameter model has undergone supervised fine-tuning (SFT) to enhance its instruction-following capabilities across a variety of tasks. The fine-tuning process utilized several datasets, including alpaca_en, qg_sft, wikiqa, webqa, and openo1_sft, aiming to broaden its understanding and response generation for diverse prompts.

Key Characteristics

  • Base Model: Meta-Llama-3.1-8B-Instruct, providing a strong foundation for general language understanding and generation.
  • Parameter Count: 8 billion parameters, offering a balance between performance and computational efficiency.
  • Context Length: Supports a substantial context window of 32768 tokens, enabling processing of longer inputs and maintaining conversational coherence.
  • Fine-tuning Datasets: Enhanced using a mix of instruction-following, question generation, and question answering datasets, suggesting improved performance in these areas.

Training Details

The model was trained with a learning rate of 5e-05, a batch size of 1 per device across 8 GPUs (totaling 48 effective batch size with gradient accumulation), and a cosine learning rate scheduler with 100 warmup steps over 1 epoch. This configuration aims to optimize the model's ability to follow instructions effectively.