kmseong/Llama3.2-3B-gsm8k-full-FT

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.2BQuant:BF16Ctx Length:32kPublished:Feb 23, 2026License:llama3.2Architecture:Transformer Warm

The kmseong/Llama3.2-3B-gsm8k-full-FT is a 3.2 billion parameter Llama 3.2 Instruct model, developed by kmseong, that has been fully fine-tuned on the GSM8K dataset. This model specializes in mathematical reasoning, particularly grade school math problems, by updating all its parameters rather than using adapter methods like LoRA. It achieves a 40.00% accuracy on the GSM8K test set and is primarily intended for tasks requiring step-by-step arithmetic problem-solving.

Loading preview...

Overview

This model, kmseong/Llama3.2-3B-gsm8k-full-FT, is a 3.2 billion parameter Llama 3.2 Instruct variant that has undergone full parameter fine-tuning specifically on the GSM8K dataset. Unlike models trained with LoRA, all ~3 billion parameters of this model were updated during training, resulting in a complete, standalone model optimized for mathematical reasoning.

Key Capabilities

  • Specialized Mathematical Reasoning: Highly optimized for solving grade school math problems, as demonstrated by its training on the GSM8K dataset.
  • Full Parameter Fine-tuning: All model weights were updated, which generally leads to better performance than LoRA for sufficient training data, though it results in a larger model file size (~6GB).
  • Direct Usage: Can be used directly without requiring PEFT (Parameter-Efficient Fine-Tuning) libraries, simplifying deployment.

Good For

  • Grade School Math Problems: Ideal for applications requiring accurate, step-by-step solutions to arithmetic and word problems similar to those found in the GSM8K dataset.
  • Research and Development: Suitable for exploring the impact of full parameter fine-tuning on smaller language models for specific tasks.
  • Benchmarking: Can serve as a baseline for evaluating mathematical reasoning capabilities in the 3B parameter class.

Limitations

  • Domain Specificity: Performance is primarily strong on GSM8K-like math problems and may degrade significantly on other mathematical domains or general language tasks.
  • Resource Intensive: Requires a GPU with at least 16GB VRAM for efficient inference due to its larger file size and full parameter nature.