Name: SidhaarthMurali/llama3.2-1b-gsm8k-full API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: SidhaarthMurali

Model Overview

This model, SidhaarthMurali/llama3.2-1b-gsm8k-full, is a specialized fine-tuned version of the meta-llama/Llama-3.2-1B base model. Developed by SidhaarthMurali, it focuses on improving performance in mathematical reasoning and problem-solving.

Key Capabilities

Mathematical Reasoning: Fine-tuned specifically on the gsm8k dataset, which is designed for grade school math word problems, indicating enhanced capabilities in arithmetic and logical problem-solving.
Identity Task Performance: Also trained on an 'identity' dataset, which can contribute to improved foundational understanding or specific task recognition.

Training Details

The model was trained with a learning rate of 1e-05, a batch size of 1 (total batch size of 2 with gradient accumulation), and for 3 epochs. It utilized the AdamW optimizer and a cosine learning rate scheduler with a warmup ratio of 0.1. The training was conducted using Transformers 4.46.1 and PyTorch 2.4.0.

Intended Use Cases

This model is particularly well-suited for applications requiring accurate numerical reasoning and the ability to solve mathematical problems, making it a strong candidate for educational tools, automated problem solvers, or data analysis tasks where mathematical logic is crucial.

Overview

Model Overview

Key Capabilities

Training Details

Intended Use Cases

Full Model Card (README)