Name: PursuitOfDataScience/llama3.2-1b-thinking API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: PursuitOfDataScience

Overview

PursuitOfDataScience/llama3.2-1b-thinking is a 1 billion parameter language model built upon the meta-llama/Llama-3.2-1B base. It has undergone a comprehensive three-stage fine-tuning process to enhance its conversational and reasoning capabilities.

Key Capabilities

Instruction Following: Supervised fine-tuning (SFT) on HuggingFaceH4/ultrachat_200k enables the model to generate helpful and concise responses in an instruction-style, multi-turn chat format.
Enhanced Reasoning: Specialized training using the open-r1/Mixture-of-Thoughts dataset significantly improves its step-by-step reasoning and Chain of Thought (CoT) capabilities, allowing it to process complex problems with explicit thought processes indicated by <think> tags.
Preference Alignment: Direct Preference Optimization (DPO) with mlabonne/orpo-dpo-mix-40k refines response quality, aligning outputs with human preferences for safety, helpfulness, and adherence to user constraints.
Chat-style Interaction: Designed for chat applications, it processes prompts as lists of messages using tokenizer.apply_chat_template.

Training Details

The model's development involved:

SFT: Fine-tuning on multi-turn dialogues from HuggingFaceH4/ultrachat_200k.
Reasoning Training: Focused on open-r1/Mixture-of-Thoughts for CoT enhancement.
DPO Alignment: Optimized with mlabonne/orpo-dpo-mix-40k to improve response quality and alignment.

Limitations

As a relatively small 1B parameter model, it may exhibit limitations such as hallucination or difficulty with highly complex, multi-step reasoning tasks. Users should verify critical information, as outputs may occasionally be inaccurate, unsafe, or biased.

Overview

Overview

Key Capabilities

Training Details

Limitations

Full Model Card (README)