shuoxing/qwen2-5-7b-full-pretrain-mix-high-tweet-1m-en-reproduce-bs8

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Jan 22, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The shuoxing/qwen2-5-7b-full-pretrain-mix-high-tweet-1m-en-reproduce-bs8 model is a 7.6 billion parameter language model, fine-tuned from Qwen/Qwen2.5-7B-Instruct. This model was specifically trained on the mix_high_tweet_1m_new dataset, suggesting an optimization for tasks related to social media text or similar short-form, high-volume content. Its primary application is likely in processing and generating text within domains characterized by such data, leveraging its Qwen2.5 base architecture.

Loading preview...

Model Overview

This model, shuoxing/qwen2-5-7b-full-pretrain-mix-high-tweet-1m-en-reproduce-bs8, is a fine-tuned variant of the Qwen2.5-7B-Instruct base model. It features 7.6 billion parameters and was trained with a context length of 131,072 tokens.

Key Characteristics

  • Base Model: Fine-tuned from Qwen/Qwen2.5-7B-Instruct, indicating a strong foundation in instruction following and general language understanding.
  • Specialized Training Data: The model underwent further training on the mix_high_tweet_1m_new dataset. This suggests a specialization in processing and generating content similar to high-volume social media posts, potentially enhancing its performance on informal or concise text.

Training Details

The fine-tuning process involved specific hyperparameters:

  • Learning Rate: 1e-05
  • Optimizer: ADAMW_TORCH_FUSED
  • Epochs: 3.0
  • Batch Size: A total training batch size of 8 across 8 devices.

Potential Use Cases

Given its fine-tuning on a tweet-like dataset, this model could be particularly effective for:

  • Analyzing and generating short-form text.
  • Tasks related to social media content understanding or creation.
  • Applications requiring processing of informal language styles.