shuoxing/llama3-8b-full-pretrain-wash-c4-0-3m-sft-bs64

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Mar 27, 2026License:llama3Architecture:Transformer Cold

The shuoxing/llama3-8b-full-pretrain-wash-c4-0-3m-sft-bs64 model is an 8 billion parameter language model, fine-tuned from shuoxing/llama3-8b-full-pretrain-wash-c4-0-3m-bs4 on the alpaca_en dataset. This model is designed for general language understanding and generation tasks, leveraging a Llama 3 base architecture. It was trained with a learning rate of 1e-05 and a total batch size of 64 over 3 epochs, making it suitable for applications requiring a moderately sized, instruction-tuned LLM.

Loading preview...

Model Overview

This model, shuoxing/llama3-8b-full-pretrain-wash-c4-0-3m-sft-bs64, is an 8 billion parameter language model based on the Llama 3 architecture. It has been fine-tuned from the shuoxing/llama3-8b-full-pretrain-wash-c4-0-3m-bs4 base model.

Key Characteristics

  • Base Model: Derived from a Llama 3 8B pre-trained variant.
  • Fine-tuning: Instruction-tuned using the alpaca_en dataset, indicating its suitability for following instructions and generating human-like text.
  • Training Parameters: The fine-tuning process utilized a learning rate of 1e-05, a total batch size of 64 (with gradient accumulation), and was trained for 3 epochs. It employed the AdamW optimizer with a cosine learning rate scheduler.

Potential Use Cases

Given its instruction-tuned nature and Llama 3 foundation, this model is likely suitable for:

  • General-purpose text generation.
  • Instruction following tasks.
  • Chatbot applications.
  • Content creation and summarization in English.