shuoxing/llama3-8b-full-sft-c4-1m-en-v2

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:May 29, 2026License:llama3Architecture:Transformer Warm

The shuoxing/llama3-8b-full-sft-c4-1m-en-v2 is an 8 billion parameter Llama 3-based causal language model, fine-tuned by shuoxing on the alpaca_en dataset. This model is a supervised fine-tuned (SFT) version, building upon a pre-trained Llama 3 variant. It is designed for general language generation tasks, leveraging its fine-tuning to enhance conversational and instruction-following capabilities.

Loading preview...

Model Overview

The shuoxing/llama3-8b-full-sft-c4-1m-en-v2 is an 8 billion parameter language model based on the Llama 3 architecture. Developed by shuoxing, this model is a supervised fine-tuned (SFT) version, specifically trained on the alpaca_en dataset. It builds upon the shuoxing/llama3-8b-full-pretrain-c4-1m-en base model, aiming to improve its instruction-following and conversational abilities.

Key Training Details

This model underwent fine-tuning with the following notable hyperparameters:

  • Learning Rate: 1e-05
  • Batch Size: A train_batch_size of 1 and gradient_accumulation_steps of 2 resulted in a total_train_batch_size of 16 across 8 GPUs.
  • Optimizer: Utilized ADAMW_TORCH_FUSED with standard beta values and epsilon.
  • Scheduler: Employs a cosine learning rate scheduler with 0.1 warmup steps.
  • Epochs: Trained for 3.0 epochs.

Intended Use Cases

While specific intended uses are not detailed in the provided information, as a supervised fine-tuned model on the alpaca_en dataset, it is generally suitable for tasks requiring:

  • Instruction following
  • Conversational AI
  • General text generation based on prompts

Users should be aware that more detailed information regarding its specific strengths and limitations is needed for comprehensive evaluation.