pavelslab-nyu/Llama-3.2-3B-ThinkSFT

TEXT GENERATIONConcurrency Cost:1Model Size:3.2BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Apr 20, 2026License:llama3.2Architecture:Transformer Cold

Llama-3.2-3B-ThinkSFT is a 3.2 billion parameter language model developed by pavelslab-nyu, fine-tuned from Llama-3.2-3B-Instruct. It specializes in explicit reasoning, particularly for mathematical tasks, by training on 43.5K reasoning traces from the OpenThoughts-114k dataset. This model is designed to enhance reasoning capabilities through a "Thinking SFT" pipeline, making it suitable for applications requiring detailed step-by-step problem-solving.

Loading preview...

Model Overview

Llama-3.2-3B-ThinkSFT is a 3.2 billion parameter language model developed by pavelslab-nyu, built upon the meta-llama/Llama-3.2-3B-Instruct base. Its core innovation lies in its Thinking SFT (Supervised Fine-Tuning) pipeline, which specifically targets and enhances reasoning abilities.

Key Capabilities

  • Enhanced Reasoning: Fine-tuned on 43.5K explicit reasoning traces from the math subset of the OpenThoughts-114k dataset.
  • Mathematical Problem Solving: Optimized for tasks requiring step-by-step logical deduction and explicit thought processes, as detailed in the associated research paper, "When Can LLMs Learn to Reason with Weak Supervision?" (Rahman et al., 2026).
  • Efficient Training: Trained for 3 epochs with a sequence length of 8,192, utilizing BF16 precision and Flash Attention 2 for efficiency.

Good For

  • Reasoning-intensive applications: Ideal for scenarios where models need to show their work or follow a logical chain of thought.
  • Mathematical tasks: Particularly strong in areas requiring explicit mathematical reasoning.
  • Research into LLM reasoning: A valuable tool for exploring how LLMs learn and apply reasoning with weak supervision.