ermiaazarkhalili/VibeThinker-3B-SFT-Claude-Opus-Reasoning-Unsloth

TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Jun 20, 2026License:mitArchitecture:Transformer0.0K Open Weights Cold

VibeThinker-3B-SFT-Claude-Opus-Reasoning-Unsloth is a 3.1 billion parameter language model developed by ermiaazarkhalili, fine-tuned from VibeThinker-3B. It is specifically optimized for reasoning distillation and chain-of-thought learning, utilizing the claude-reasoning-distillation dataset. The model leverages Unsloth for efficient training, resulting in faster fine-tuning and reduced VRAM usage. This model is designed to excel in tasks requiring step-by-step reasoning, with a context length of 2048 tokens.

Loading preview...

Model Overview

This model, VibeThinker-3B-SFT-Claude-Opus-Reasoning-Unsloth, is a 3.1 billion parameter language model developed by ermiaazarkhalili. It is a fine-tuned version of the VibeThinker-3B base model, specifically optimized for reasoning distillation and chain-of-thought capabilities.

Key Capabilities & Features

  • Reasoning Distillation: Fine-tuned on the claude-reasoning-distillation dataset, which contains over 10,000 samples of Claude's reasoning traces with <think> blocks.
  • Efficient Training: Utilizes Unsloth for fine-tuning, enabling 2x faster training and 60% less VRAM consumption compared to standard methods.
  • QLoRA Fine-tuning: Trained using QLoRA (4-bit) with a 2048 token context window.
  • Accessibility: Available in GGUF versions for CPU and edge inference, supporting platforms like Ollama and llama.cpp.

Use Cases

This model is particularly well-suited for:

  • Reasoning Tasks: Ideal for applications requiring step-by-step problem-solving and logical deduction.
  • Educational Tools: Can be integrated into systems that teach or demonstrate complex reasoning processes.
  • Resource-Efficient Deployment: Its 3.1B parameter size and Unsloth optimization make it suitable for environments with limited computational resources.

Limitations

  • Primarily trained on English data.
  • Knowledge is limited to the base model's training data cutoff.
  • May exhibit hallucinations and is not extensively safety-tuned.