shuoxing/qwen2-5-7b-full-pretrain-mix-low-tweet-1m-en-reproduce-bs8

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Jan 22, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The shuoxing/qwen2-5-7b-full-pretrain-mix-low-tweet-1m-en-reproduce-bs8 model is a 7.6 billion parameter language model, fine-tuned from Qwen/Qwen2.5-7B-Instruct. It was specifically trained on the mix_low_tweet_1m_new dataset, suggesting an optimization for processing and generating content related to social media or short-form text. This model is designed for tasks requiring nuanced understanding and generation within a specific, potentially informal, textual domain.

Loading preview...

Overview

This model, shuoxing/qwen2-5-7b-full-pretrain-mix-low-tweet-1m-en-reproduce-bs8, is a 7.6 billion parameter language model. It is a fine-tuned variant of the Qwen/Qwen2.5-7B-Instruct base model, developed by Qwen. The fine-tuning process specifically utilized the mix_low_tweet_1m_new dataset.

Key Characteristics

  • Base Model: Qwen/Qwen2.5-7B-Instruct.
  • Parameter Count: 7.6 billion parameters.
  • Context Length: 131,072 tokens.
  • Training Data: Fine-tuned on the mix_low_tweet_1m_new dataset, indicating a specialization in short-form, potentially social media-style text.

Training Details

The model underwent training with a learning rate of 1e-05, a train_batch_size of 1, and num_epochs set to 3.0. It utilized a multi-GPU setup with 8 devices and an AdamW optimizer with cosine learning rate scheduling and a 0.1 warmup ratio.

Intended Use Cases

Given its fine-tuning on a dataset likely comprising short, informal text, this model is potentially well-suited for tasks such as:

  • Generating or analyzing social media posts.
  • Understanding and responding to short-form conversational text.
  • Applications requiring text generation with a specific, concise style.