shuoxing/qwen2-5-7b-full-pretrain-mix-low-tweet-1m-en-reproduce-bs8
The shuoxing/qwen2-5-7b-full-pretrain-mix-low-tweet-1m-en-reproduce-bs8 model is a 7.6 billion parameter language model, fine-tuned from Qwen/Qwen2.5-7B-Instruct. It was specifically trained on the mix_low_tweet_1m_new dataset, suggesting an optimization for processing and generating content related to social media or short-form text. This model is designed for tasks requiring nuanced understanding and generation within a specific, potentially informal, textual domain.
Loading preview...
Overview
This model, shuoxing/qwen2-5-7b-full-pretrain-mix-low-tweet-1m-en-reproduce-bs8, is a 7.6 billion parameter language model. It is a fine-tuned variant of the Qwen/Qwen2.5-7B-Instruct base model, developed by Qwen. The fine-tuning process specifically utilized the mix_low_tweet_1m_new dataset.
Key Characteristics
- Base Model: Qwen/Qwen2.5-7B-Instruct.
- Parameter Count: 7.6 billion parameters.
- Context Length: 131,072 tokens.
- Training Data: Fine-tuned on the
mix_low_tweet_1m_newdataset, indicating a specialization in short-form, potentially social media-style text.
Training Details
The model underwent training with a learning rate of 1e-05, a train_batch_size of 1, and num_epochs set to 3.0. It utilized a multi-GPU setup with 8 devices and an AdamW optimizer with cosine learning rate scheduling and a 0.1 warmup ratio.
Intended Use Cases
Given its fine-tuning on a dataset likely comprising short, informal text, this model is potentially well-suited for tasks such as:
- Generating or analyzing social media posts.
- Understanding and responding to short-form conversational text.
- Applications requiring text generation with a specific, concise style.