micaebe/Qwen2.5-1.5B-Instruct-QwQ

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Dec 2, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

micaebe/Qwen2.5-1.5B-Instruct-QwQ is a 1.54 billion parameter instruction-tuned causal language model based on the Qwen2.5 architecture, developed by micaebe. This model is specifically fine-tuned on QwQ reasoning chains, enhancing its performance in mathematics contexts and general reasoning tasks. It demonstrates capabilities in self-correction, making it suitable for applications requiring improved logical processing in a compact form factor with a 32,768 token context length.

Loading preview...

Overview

micaebe/Qwen2.5-1.5B-Instruct-QwQ is a 1.54 billion parameter instruction-tuned causal language model, fine-tuned from the Qwen2.5-1.5B-Instruct base model. It leverages the Qwen2.5 architecture, featuring transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings. The model has a substantial context length of 32,768 tokens for input and 8,192 tokens for generation.

Key Capabilities

  • Enhanced Mathematical Reasoning: Fine-tuned on approximately 20,000 samples from QwQ-32B-Preview, including math problems from GSM8k and MATH datasets, leading to improved performance in mathematical contexts.
  • General Reasoning: Shows better general reasoning capabilities compared to its base model.
  • Self-Correction: Exhibits some self-correction abilities, though these are noted to be more limited than in larger Qwen2.5 models (e.g., 3B and 7B versions).

Performance

  • Achieves 73.2% on the GSM8k test set (based on the first 27% of the dataset).

Good For

  • Applications requiring a compact model with improved mathematical and general reasoning.
  • Use cases where some degree of self-correction is beneficial, particularly in a 1.5B parameter size class.