shuoxing/llama3-8b-full-pretrain-wash-c4-0-3m-bs4

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Mar 26, 2026License:llama3Architecture:Transformer Cold

The shuoxing/llama3-8b-full-pretrain-wash-c4-0-3m-bs4 model is an 8 billion parameter language model, fine-tuned by shuoxing, based on the Llama 3 architecture. It is a fine-tuned version of shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8, specifically trained on the c4_0_3m dataset. This model is designed for general language understanding tasks, leveraging its Llama 3 foundation and targeted C4 dataset training.

Loading preview...

Model Overview

This model, shuoxing/llama3-8b-full-pretrain-wash-c4-0-3m-bs4, is an 8 billion parameter language model built upon the Llama 3 architecture. It represents a fine-tuned iteration of the shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8 model.

Key Training Details

  • Base Model: Fine-tuned from shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8.
  • Dataset: Training was conducted on the c4_0_3m dataset, suggesting a focus on general web text understanding and generation.
  • Hyperparameters: Key training parameters included a learning rate of 1e-05, a total batch size of 4 (across 4 devices), and 3 epochs of training using a cosine learning rate scheduler.

Potential Use Cases

Given its foundation and training on the C4 dataset, this model is likely suitable for:

  • General text generation and completion.
  • Understanding and processing diverse web-based content.
  • As a base for further fine-tuning on more specific downstream tasks.