ShourenWSR/HT-phase_scale-Qwen-140k-phase2

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Sep 16, 2025License:otherArchitecture:Transformer Cold

ShourenWSR/HT-phase_scale-Qwen-140k-phase2 is a 7.6 billion parameter language model, fine-tuned from a previous Qwen phase1 model on the phase2_140k dataset. This model is an iteration in a multi-phase training process, building upon prior fine-tuning. It is designed for general language understanding and generation tasks, leveraging the Qwen architecture for broad applicability.

Loading preview...

Model Overview

ShourenWSR/HT-phase_scale-Qwen-140k-phase2 is a 7.6 billion parameter language model, representing the second phase of fine-tuning in a multi-stage training process. It is built upon an earlier version, Qwen_phase1_140k, and further fine-tuned on a dataset referred to as phase2_140k. This iterative approach suggests a progressive refinement of the model's capabilities.

Training Details

The model was trained using specific hyperparameters to optimize its performance:

  • Learning Rate: 1e-05
  • Batch Size: A train_batch_size of 1 with gradient_accumulation_steps of 12 resulted in a total_train_batch_size of 24.
  • Optimizer: adamw_torch with default betas and epsilon.
  • Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio.
  • Epochs: Trained for 3.0 epochs.
  • Environment: Distributed training across 2 GPUs.

Intended Use Cases

While specific intended uses are not detailed, as a fine-tuned Qwen-based model, it is generally suitable for a wide range of natural language processing tasks, including text generation, summarization, and question answering, benefiting from its 7.6B parameter count and 32768 token context length.