Name: shuoxing/llama3-8b-full-sft-c4-1m-en-v2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: shuoxing

Model Overview

The shuoxing/llama3-8b-full-sft-c4-1m-en-v2 is an 8 billion parameter language model based on the Llama 3 architecture. Developed by shuoxing, this model is a supervised fine-tuned (SFT) version, specifically trained on the alpaca_en dataset. It builds upon the shuoxing/llama3-8b-full-pretrain-c4-1m-en base model, aiming to improve its instruction-following and conversational abilities.

Key Training Details

This model underwent fine-tuning with the following notable hyperparameters:

Learning Rate: 1e-05
Batch Size: A train_batch_size of 1 and gradient_accumulation_steps of 2 resulted in a total_train_batch_size of 16 across 8 GPUs.
Optimizer: Utilized ADAMW_TORCH_FUSED with standard beta values and epsilon.
Scheduler: Employs a cosine learning rate scheduler with 0.1 warmup steps.
Epochs: Trained for 3.0 epochs.

Intended Use Cases

While specific intended uses are not detailed in the provided information, as a supervised fine-tuned model on the alpaca_en dataset, it is generally suitable for tasks requiring:

Instruction following
Conversational AI
General text generation based on prompts

Users should be aware that more detailed information regarding its specific strengths and limitations is needed for comprehensive evaluation.

Overview

Model Overview

Key Training Details

Intended Use Cases

Full Model Card (README)