UWNSL/Qwen2.5-3B-Instruct_Long_CoT

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Dec 22, 2024License:otherArchitecture:Transformer Warm

UWNSL/Qwen2.5-3B-Instruct_Long_CoT is a 3.1 billion parameter instruction-tuned causal language model, fine-tuned from Qwen/Qwen2.5-3B-Instruct. This model is specifically optimized for mathematical reasoning tasks, having been trained on the MATH_training_Qwen_QwQ_32B_Preview dataset. It is designed for applications requiring enhanced performance in solving complex mathematical problems.

Loading preview...

Model Overview

UWNSL/Qwen2.5-3B-Instruct_Long_CoT is a specialized 3.1 billion parameter instruction-tuned language model. It is a fine-tuned variant of the base Qwen/Qwen2.5-3B-Instruct model, with a focus on improving performance in mathematical domains. The model was trained using a learning rate of 1e-05 over 2 epochs, achieving a final validation loss of 0.3268.

Key Capabilities

  • Mathematical Reasoning: Specifically fine-tuned on the MATH_training_Qwen_QwQ_32B_Preview dataset, indicating an optimization for mathematical problem-solving.
  • Instruction Following: Inherits instruction-following capabilities from its base Qwen2.5-3B-Instruct model.

Training Details

The training process involved:

  • Base Model: Qwen/Qwen2.5-3B-Instruct
  • Dataset: MATH_training_Qwen_QwQ_32B_Preview
  • Hyperparameters: Learning rate of 1e-05, train_batch_size of 4, eval_batch_size of 1, and 2 training epochs.
  • Frameworks: Transformers 4.46.1, Pytorch 2.5.1+cu124, Datasets 3.1.0, Tokenizers 0.20.3.

Good For

  • Applications requiring a compact model with enhanced mathematical reasoning abilities.
  • Tasks involving solving or generating responses to mathematical queries.