Name: tengfeima-ai/Qwen2.5-0.5B-Math-SFT-Concise API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: tengfeima-ai

Model Overview

The tengfeima-ai/Qwen2.5-0.5B-Math-SFT-Concise is a 0.5 billion parameter language model, fine-tuned from the base Qwen/Qwen2.5-0.5B-Instruct model. It has been specialized through fine-tuning on the concise_sft dataset, suggesting an optimization for generating brief and direct outputs. The model maintains a substantial context length of 32768 tokens, allowing it to process and generate responses based on extensive input.

Training Details

This model was trained using the following key hyperparameters:

Learning Rate: 2e-05
Batch Size: 8 (train), 8 (eval)
Gradient Accumulation: 4 steps, resulting in a total training batch size of 64
Optimizer: ADAMW_TORCH
LR Scheduler: Cosine type with a 0.1 warmup ratio
Epochs: 3

During training, the model achieved a final validation loss of 0.6726, with the lowest validation loss recorded at 0.6232 during the 2500th step. The training process utilized Transformers 4.57.1, Pytorch 2.4.1+cu124, Datasets 4.0.0, and Tokenizers 0.22.2.

Potential Use Cases

Given its fine-tuning on a "concise_sft" dataset, this model is likely best suited for applications requiring:

Summarization tasks where brevity is paramount.
Generating direct answers or short explanations.
Efficient deployment in resource-constrained environments due to its smaller parameter count.

Overview

Model Overview

Training Details

Potential Use Cases

Full Model Card (README)