shuoxing/qwen2-5-7b-full-pretrain-control-tweet-1m-en-reproduce-bs8
The shuoxing/qwen2-5-7b-full-pretrain-control-tweet-1m-en-reproduce-bs8 model is a 7.6 billion parameter language model, fine-tuned from Qwen/Qwen2.5-7B-Instruct. This model is specifically adapted using the control_tweet_1m_new dataset, suggesting an optimization for tasks related to tweet analysis or generation. Its primary differentiator lies in this specialized fine-tuning, making it suitable for applications requiring nuanced understanding or generation of social media text, particularly tweets.
Loading preview...
Overview
This model, shuoxing/qwen2-5-7b-full-pretrain-control-tweet-1m-en-reproduce-bs8, is a specialized fine-tuned version of the Qwen2.5-7B-Instruct base model. It has been adapted using the control_tweet_1m_new dataset, indicating a focus on processing or generating content related to social media tweets.
Key Characteristics
- Base Model: Qwen/Qwen2.5-7B-Instruct
- Fine-tuning Dataset:
control_tweet_1m_new - Parameter Count: 7.6 billion parameters
- Training Hyperparameters:
- Learning Rate: 1e-05
- Optimizer: AdamW (fused) with betas=(0.9, 0.999) and epsilon=1e-08
- Batch Size: 1 (train), 8 (eval) per device, totaling 8 (train) and 64 (eval) across 8 GPUs
- Epochs: 3.0
- LR Scheduler: Cosine with 0.1 warmup ratio
Intended Use Cases
Given its fine-tuning on a tweet-specific dataset, this model is likely best suited for applications involving:
- Tweet analysis (e.g., sentiment, topic extraction)
- Tweet generation or summarization
- Social media content understanding
Limitations
The README indicates that more information is needed regarding specific intended uses, limitations, and detailed training/evaluation data. Users should exercise caution and conduct further evaluation for critical applications.