Name: W-61/llama3-8b-base-new-method-q_t-0.4-s_star0.6-beta-next-batch API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: W-61

Model Overview

This model, llama3-8b-base-new-method-q_t-0.4-s_star0.6-beta-next-batch, is an 8 billion parameter language model developed by W-61. It is a fine-tuned iteration of the W-61/llama-3-8b-base-sft-ultrachat-8xh200 model, specifically enhanced through training on the HuggingFaceH4/ultrafeedback_binarized dataset.

Training Details

The model underwent a single epoch of training with a learning rate of 5e-07. Key training hyperparameters include:

Optimizer: ADAMW_TORCH with betas=(0.9, 0.999) and epsilon=1e-08.
Batch Size: A train_batch_size of 8 and eval_batch_size of 8, leading to a total_train_batch_size of 128 across 4 multi-GPU devices with 4 gradient accumulation steps.
Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio.

Intended Use

While specific intended uses and limitations require further information, its fine-tuning on a feedback-binarized dataset suggests potential strengths in:

Conversational AI: Generating more aligned and helpful responses in dialogue systems.
Instruction Following: Improved ability to adhere to given instructions based on human feedback data.

This model is built upon the robust Llama 3 architecture, making it suitable for a wide range of natural language processing tasks where an 8B parameter model is appropriate.

Overview

Model Overview

Training Details

Intended Use

Full Model Card (README)