shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8
The shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8 model is an 8 billion parameter Llama 3 variant, fine-tuned from Meta-Llama-3-8B-Instruct. This model specializes in processing and generating content related to the 'junk_tweet_1m_en_new' dataset. Its primary application is for tasks involving English-language social media data, particularly tweets, with a context length of 8192 tokens.
Loading preview...
Model Overview
This model, shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8, is an 8 billion parameter language model based on the Meta-Llama-3-8B-Instruct architecture. It has been specifically fine-tuned on the junk_tweet_1m_en_new dataset.
Key Characteristics
- Base Model: Fine-tuned from
meta-llama/Meta-Llama-3-8B-Instruct. - Parameter Count: 8 billion parameters.
- Context Length: Supports an 8192-token context window.
- Specialization: Optimized for tasks related to English-language social media content, particularly tweets, due to its training on the
junk_tweet_1m_en_newdataset.
Training Details
The model was trained with the following key hyperparameters:
- Learning Rate: 1e-05
- Optimizer:
adamw_torch_fused - LR Scheduler: Cosine type with a 0.1 warmup ratio.
- Epochs: 3.0
- Batch Size: A total training batch size of 8 across 8 GPUs.
Intended Use Cases
Given its fine-tuning on a specific tweet dataset, this model is best suited for applications requiring an understanding or generation of English social media text, especially in contexts similar to the junk_tweet_1m_en_new dataset. Potential uses include tweet analysis, content generation for social media, or research into specific linguistic patterns found in online discourse.