Name: shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: shuoxing

Model Overview

This model, shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8, is an 8 billion parameter language model based on the Meta-Llama-3-8B-Instruct architecture. It has been specifically fine-tuned on the junk_tweet_1m_en_new dataset.

Key Characteristics

Base Model: Fine-tuned from meta-llama/Meta-Llama-3-8B-Instruct.
Parameter Count: 8 billion parameters.
Context Length: Supports an 8192-token context window.
Specialization: Optimized for tasks related to English-language social media content, particularly tweets, due to its training on the junk_tweet_1m_en_new dataset.

Training Details

The model was trained with the following key hyperparameters:

Learning Rate: 1e-05
Optimizer: adamw_torch_fused
LR Scheduler: Cosine type with a 0.1 warmup ratio.
Epochs: 3.0
Batch Size: A total training batch size of 8 across 8 GPUs.

Intended Use Cases

Given its fine-tuning on a specific tweet dataset, this model is best suited for applications requiring an understanding or generation of English social media text, especially in contexts similar to the junk_tweet_1m_en_new dataset. Potential uses include tweet analysis, content generation for social media, or research into specific linguistic patterns found in online discourse.

Overview

Model Overview

Key Characteristics

Training Details

Intended Use Cases

Full Model Card (README)