ShourenWSR/HT-phase_scale-Qwen-140k-phase2
ShourenWSR/HT-phase_scale-Qwen-140k-phase2 is a 7.6 billion parameter language model, fine-tuned from a previous Qwen phase1 model on the phase2_140k dataset. This model is an iteration in a multi-phase training process, building upon prior fine-tuning. It is designed for general language understanding and generation tasks, leveraging the Qwen architecture for broad applicability.
Loading preview...
Model Overview
ShourenWSR/HT-phase_scale-Qwen-140k-phase2 is a 7.6 billion parameter language model, representing the second phase of fine-tuning in a multi-stage training process. It is built upon an earlier version, Qwen_phase1_140k, and further fine-tuned on a dataset referred to as phase2_140k. This iterative approach suggests a progressive refinement of the model's capabilities.
Training Details
The model was trained using specific hyperparameters to optimize its performance:
- Learning Rate: 1e-05
- Batch Size: A
train_batch_sizeof 1 withgradient_accumulation_stepsof 12 resulted in atotal_train_batch_sizeof 24. - Optimizer:
adamw_torchwith default betas and epsilon. - Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio.
- Epochs: Trained for 3.0 epochs.
- Environment: Distributed training across 2 GPUs.
Intended Use Cases
While specific intended uses are not detailed, as a fine-tuned Qwen-based model, it is generally suitable for a wide range of natural language processing tasks, including text generation, summarization, and question answering, benefiting from its 7.6B parameter count and 32768 token context length.