Name: xw1234gan/Extended_Merging_Qwen2.5-3B-Instruct_MATH_lr1e-05_mb2_ga128_n2048_seed42 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: xw1234gan

Model Overview

This model, xw1234gan/Extended_Merging_Qwen2.5-3B-Instruct_MATH_lr1e-05_mb2_ga128_n2048_seed42, is a 3.1 billion parameter instruction-tuned language model built upon the Qwen2.5 architecture. It has been developed with a specific focus on enhancing its performance in mathematical domains through an extended merging technique during its training process.

Key Characteristics

Base Architecture: Qwen2.5-3B-Instruct, providing a robust foundation for instruction following.
Parameter Count: 3.1 billion parameters, offering a balance between performance and computational efficiency.
Context Length: Supports a context length of 32768 tokens, allowing for processing of longer mathematical problems or related instructions.
Specialized Training: Fine-tuned with a learning rate of 1e-05, a micro-batch size of 2, and gradient accumulation of 128, over 2048 steps, with a seed of 42, indicating a focused training regimen.

Intended Use Cases

This model is particularly well-suited for applications requiring strong mathematical reasoning and problem-solving capabilities. While specific benchmarks are not provided in the model card, its naming convention and training parameters suggest an optimization for:

Solving mathematical equations and word problems.
Assisting in educational tools for math students.
Generating explanations for mathematical concepts.
Applications where numerical accuracy and logical deduction are paramount.

Overview

Model Overview

Key Characteristics

Intended Use Cases

Full Model Card (README)