PursuitOfDataScience/llama3.2-1b-thinking

TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kPublished:Nov 24, 2025License:otherArchitecture:Transformer Cold

PursuitOfDataScience/llama3.2-1b-thinking is a 1 billion parameter Llama 3.2-based model fine-tuned through a three-stage process including SFT, reasoning training, and DPO. It is optimized for instruction-following chat, multi-turn conversations, and enhanced step-by-step reasoning using Chain of Thought (CoT) with tags. This model aims to provide helpful and concise responses, particularly for tasks requiring logical thought processes.

Loading preview...

Overview

PursuitOfDataScience/llama3.2-1b-thinking is a 1 billion parameter language model built upon the meta-llama/Llama-3.2-1B base. It has undergone a comprehensive three-stage fine-tuning process to enhance its conversational and reasoning capabilities.

Key Capabilities

  • Instruction Following: Supervised fine-tuning (SFT) on HuggingFaceH4/ultrachat_200k enables the model to generate helpful and concise responses in an instruction-style, multi-turn chat format.
  • Enhanced Reasoning: Specialized training using the open-r1/Mixture-of-Thoughts dataset significantly improves its step-by-step reasoning and Chain of Thought (CoT) capabilities, allowing it to process complex problems with explicit thought processes indicated by <think> tags.
  • Preference Alignment: Direct Preference Optimization (DPO) with mlabonne/orpo-dpo-mix-40k refines response quality, aligning outputs with human preferences for safety, helpfulness, and adherence to user constraints.
  • Chat-style Interaction: Designed for chat applications, it processes prompts as lists of messages using tokenizer.apply_chat_template.

Training Details

The model's development involved:

  1. SFT: Fine-tuning on multi-turn dialogues from HuggingFaceH4/ultrachat_200k.
  2. Reasoning Training: Focused on open-r1/Mixture-of-Thoughts for CoT enhancement.
  3. DPO Alignment: Optimized with mlabonne/orpo-dpo-mix-40k to improve response quality and alignment.

Limitations

As a relatively small 1B parameter model, it may exhibit limitations such as hallucination or difficulty with highly complex, multi-step reasoning tasks. Users should verify critical information, as outputs may occasionally be inaccurate, unsafe, or biased.