ShourenWSR/HT-ht-analysis-Qwen-think-only

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Sep 21, 2025License:otherArchitecture:Transformer Cold

The ShourenWSR/HT-ht-analysis-Qwen-think-only model is a 7.6 billion parameter language model, fine-tuned from Qwen/Qwen2.5-7B. It is specifically trained on the ht-analysis_think_only dataset, suggesting an optimization for analytical or reasoning tasks. With a context length of 32768 tokens, it is designed for processing extensive inputs relevant to its specialized training data.

Loading preview...

Overview

This model, named Qwen_think_only, is a fine-tuned variant of the Qwen/Qwen2.5-7B architecture, featuring 7.6 billion parameters and a substantial 32768-token context window. It has undergone specialized training on the ht-analysis_think_only dataset.

Key Capabilities

  • Specialized Fine-tuning: Optimized through training on the ht-analysis_think_only dataset, indicating a focus on specific analytical or reasoning tasks.
  • Base Model: Built upon the robust Qwen2.5-7B foundation, inheriting its general language understanding and generation capabilities.
  • Extended Context Window: Supports a 32768-token context length, enabling the processing of lengthy and complex inputs.

Training Details

The model was trained using the following hyperparameters:

  • Learning Rate: 1e-05
  • Optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08
  • Epochs: 3.0
  • Batch Size: 1 (train), 8 (eval) with 12 gradient accumulation steps, resulting in a total effective batch size of 24.

Good For

  • Applications requiring analysis or reasoning based on the ht-analysis_think_only dataset's characteristics.
  • Tasks benefiting from a large context window for detailed information processing.

Limitations

  • Specific intended uses and limitations are not fully detailed in the provided information, suggesting further evaluation is needed for diverse applications.