shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Dec 24, 2025License:llama3Architecture:Transformer Cold

The shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8 model is an 8 billion parameter Llama 3 variant, fine-tuned from Meta-Llama-3-8B-Instruct. This model specializes in processing and generating content related to the 'junk_tweet_1m_en_new' dataset. Its primary application is for tasks involving English-language social media data, particularly tweets, with a context length of 8192 tokens.

Loading preview...

Model Overview

This model, shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8, is an 8 billion parameter language model based on the Meta-Llama-3-8B-Instruct architecture. It has been specifically fine-tuned on the junk_tweet_1m_en_new dataset.

Key Characteristics

  • Base Model: Fine-tuned from meta-llama/Meta-Llama-3-8B-Instruct.
  • Parameter Count: 8 billion parameters.
  • Context Length: Supports an 8192-token context window.
  • Specialization: Optimized for tasks related to English-language social media content, particularly tweets, due to its training on the junk_tweet_1m_en_new dataset.

Training Details

The model was trained with the following key hyperparameters:

  • Learning Rate: 1e-05
  • Optimizer: adamw_torch_fused
  • LR Scheduler: Cosine type with a 0.1 warmup ratio.
  • Epochs: 3.0
  • Batch Size: A total training batch size of 8 across 8 GPUs.

Intended Use Cases

Given its fine-tuning on a specific tweet dataset, this model is best suited for applications requiring an understanding or generation of English social media text, especially in contexts similar to the junk_tweet_1m_en_new dataset. Potential uses include tweet analysis, content generation for social media, or research into specific linguistic patterns found in online discourse.