Name: tengfeima-ai/Qwen2.5-0.5B-Math-SFT-1024 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: tengfeima-ai

Overview

This model, tengfeima-ai/Qwen2.5-0.5B-Math-SFT-1024, is a specialized language model with 0.5 billion parameters. It is a fine-tuned variant of the Qwen/Qwen2.5-0.5B-Instruct base model, specifically trained on the deepmath_sft_1024 dataset.

Key Capabilities

Mathematical Task Optimization: The primary focus of this model is to enhance performance on mathematical reasoning and problem-solving tasks due to its specialized fine-tuning dataset.
Context Length: It supports a substantial context window of 32768 tokens, allowing for processing longer mathematical problems or related textual information.
Performance: During training, the model achieved a validation loss of 0.6684, indicating its learning efficacy on the mathematical dataset.

Training Details

The model was trained using a learning rate of 2e-05, a batch size of 4 (with 8 gradient accumulation steps for an effective total batch size of 64), and a cosine learning rate scheduler over 3 epochs. The training utilized Transformers 4.57.1 and Pytorch 2.4.1+cu124.

Good for

Applications requiring a compact model for mathematical problem-solving.
Research and development in mathematical reasoning with LLMs.
Scenarios where a specialized, smaller model is preferred over larger, general-purpose alternatives for math-centric tasks.

Overview

Overview

Key Capabilities

Training Details

Good for

Full Model Card (README)