Name: jahyungu/Qwen2.5-1.5B-Instruct_gsm8k API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jahyungu

Model Overview

The jahyungu/Qwen2.5-1.5B-Instruct_gsm8k is a 1.5 billion parameter instruction-tuned model, building upon the base Qwen/Qwen2.5-1.5B-Instruct architecture. This particular iteration has undergone further fine-tuning, with its name suggesting a specialization for tasks related to the GSM8K dataset, which typically involves grade school mathematical word problems. This fine-tuning implies an enhanced capability in numerical reasoning and problem-solving.

Key Characteristics

Base Model: Qwen2.5-1.5B-Instruct
Parameter Count: 1.5 billion parameters
Context Length: Supports a substantial context window of 32768 tokens.
Specialization: Fine-tuned for tasks related to the GSM8K dataset, indicating a focus on mathematical reasoning.

Training Details

The model was trained with a learning rate of 1e-05, a batch size of 2 (accumulated to 16), and utilized the AdamW optimizer. A cosine learning rate scheduler with a 0.03 warmup ratio was employed over 3 epochs. The training environment included Transformers 4.50.0, Pytorch 2.6.0+cu124, Datasets 3.4.1, and Tokenizers 0.21.0.

Intended Use Cases

This model is likely best suited for applications requiring strong performance in mathematical problem-solving, particularly those aligned with the complexity of the GSM8K dataset. Its instruction-tuned nature makes it responsive to prompts for specific tasks within this domain.

Overview

Model Overview

Key Characteristics

Training Details

Intended Use Cases

Full Model Card (README)