Name: shuoxing/llama3-8b-full-pretrain-wash-c4-2-7m-bs4 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: shuoxing

Model Overview

This model, shuoxing/llama3-8b-full-pretrain-wash-c4-2-7m-bs4, is an 8 billion parameter language model. It is a fine-tuned version of shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8, indicating a continuation of pre-training or adaptation from a previous checkpoint. The base architecture is likely derived from the Llama 3 family, given the naming convention.

Training Details

The model underwent training with the following key hyperparameters:

Learning Rate: 1e-05
Optimizer: ADAMW_TORCH with default betas and epsilon
LR Scheduler: Cosine type with 0.1 warmup steps
Epochs: 3.0
Batch Size: A total training batch size of 4 (1 per device across 4 GPUs).

Current Status and Information Gaps

As per the provided model card, specific details regarding the dataset used for this fine-tuning, its intended uses, limitations, and evaluation results are currently not available. This suggests it might be an intermediate or experimental checkpoint rather than a fully documented, ready-for-production model.

When to Consider Using This Model

Given the limited information, this model is primarily suitable for:

Research and Experimentation: Developers interested in exploring the effects of specific fine-tuning parameters or continuing pre-training from this checkpoint.
Understanding Training Processes: Analyzing the training configuration (hyperparameters, optimizer, scheduler) for similar Llama 3-based models.

It is not recommended for general production use cases without further evaluation and understanding of its capabilities and limitations.

Overview

Model Overview

Training Details

Current Status and Information Gaps

When to Consider Using This Model

Full Model Card (README)