UWNSL/Qwen2.5-3B-Instruct_Mix-Long

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Feb 24, 2025License:otherArchitecture:Transformer Warm

UWNSL/Qwen2.5-3B-Instruct_Mix-Long is a 3.1 billion parameter instruction-tuned causal language model, fine-tuned from Qwen/Qwen2.5-3B-Instruct. This model is specifically trained on the Mix-Long_long_0.2_short_0.8 dataset, indicating an optimization for handling varied context lengths, including long contexts up to 32768 tokens. It is designed for general instruction-following tasks, leveraging its fine-tuning to potentially improve performance on mixed-length inputs.

Loading preview...

Model Overview

UWNSL/Qwen2.5-3B-Instruct_Mix-Long is a 3.1 billion parameter instruction-tuned language model, building upon the base architecture of Qwen/Qwen2.5-3B-Instruct. This model has undergone specific fine-tuning on the Mix-Long_long_0.2_short_0.8 dataset, suggesting an emphasis on processing and generating content across a spectrum of input lengths, including extended contexts up to 32768 tokens.

Key Characteristics

  • Base Model: Fine-tuned from Qwen/Qwen2.5-3B-Instruct, a robust causal language model.
  • Parameter Count: Features 3.1 billion parameters, offering a balance between performance and computational efficiency.
  • Context Length: Supports a substantial context window of 32768 tokens, enabling the processing of longer documents and conversations.
  • Training Data: Fine-tuned on the Mix-Long_long_0.2_short_0.8 dataset, which implies a focus on improving performance across diverse input lengths.
  • Training Performance: Achieved a loss of 0.2159 on the evaluation set during its fine-tuning process.

Potential Use Cases

  • Instruction Following: Designed for general instruction-following tasks, benefiting from its instruction-tuned base.
  • Long-Context Applications: Suitable for tasks requiring the understanding or generation of extended text, such as summarization of long documents or detailed content creation.
  • Mixed-Length Inputs: Potentially well-suited for scenarios where input prompts vary significantly in length, from short queries to comprehensive requests.