Name: shuoxing/llama3-8b-full-pretrain-wash-c4-3-0m-bs4 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: shuoxing

Model Overview

This model, shuoxing/llama3-8b-full-pretrain-wash-c4-3-0m-bs4, is an 8 billion parameter Llama 3-based language model. It was fine-tuned by shuoxing from a previous iteration, shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8, specifically on the c4_3_0m dataset.

Training Details

The fine-tuning process utilized the following key hyperparameters:

Learning Rate: 1e-05
Batch Size: A total training batch size of 4 (1 per device across 4 GPUs)
Optimizer: AdamW with default betas and epsilon
Scheduler: Cosine learning rate scheduler with 0.1 warmup steps
Epochs: 3.0

Frameworks Used

The model was trained using:

Transformers 5.2.0
Pytorch 2.6.0+cu124
Datasets 4.0.0
Tokenizers 0.22.2

Intended Use Cases

Given its fine-tuning on the C4 dataset, this model is suitable for research and experimentation in areas where the characteristics of the C4 dataset are relevant. Further information on specific intended uses and limitations would require additional details from the model developer.

Overview

Model Overview

Training Details

Frameworks Used

Intended Use Cases

Full Model Card (README)