shuoxing/qwen2-5-7b-full-pretrain-control-tweet-1m-en-reproduce-bs8

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Jan 22, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The shuoxing/qwen2-5-7b-full-pretrain-control-tweet-1m-en-reproduce-bs8 model is a 7.6 billion parameter language model, fine-tuned from Qwen/Qwen2.5-7B-Instruct. This model is specifically adapted using the control_tweet_1m_new dataset, suggesting an optimization for tasks related to tweet analysis or generation. Its primary differentiator lies in this specialized fine-tuning, making it suitable for applications requiring nuanced understanding or generation of social media text, particularly tweets.

Loading preview...

Overview

This model, shuoxing/qwen2-5-7b-full-pretrain-control-tweet-1m-en-reproduce-bs8, is a specialized fine-tuned version of the Qwen2.5-7B-Instruct base model. It has been adapted using the control_tweet_1m_new dataset, indicating a focus on processing or generating content related to social media tweets.

Key Characteristics

  • Base Model: Qwen/Qwen2.5-7B-Instruct
  • Fine-tuning Dataset: control_tweet_1m_new
  • Parameter Count: 7.6 billion parameters
  • Training Hyperparameters:
    • Learning Rate: 1e-05
    • Optimizer: AdamW (fused) with betas=(0.9, 0.999) and epsilon=1e-08
    • Batch Size: 1 (train), 8 (eval) per device, totaling 8 (train) and 64 (eval) across 8 GPUs
    • Epochs: 3.0
    • LR Scheduler: Cosine with 0.1 warmup ratio

Intended Use Cases

Given its fine-tuning on a tweet-specific dataset, this model is likely best suited for applications involving:

  • Tweet analysis (e.g., sentiment, topic extraction)
  • Tweet generation or summarization
  • Social media content understanding

Limitations

The README indicates that more information is needed regarding specific intended uses, limitations, and detailed training/evaluation data. Users should exercise caution and conduct further evaluation for critical applications.