qihoo360/TinyR1-32B-Preview

Warm
Public
32.8B
FP8
131072
License: apache-2.0
Hugging Face
Overview

TinyR1-32B-Preview: A Specialized Reasoning Model

TinyR1-32B-Preview is qihoo360's first-generation reasoning model, designed to excel in specific analytical domains. This 32 billion parameter model is built upon the Deepseek-R1-Distill-Qwen-32B architecture and has been fine-tuned using the 360-LLaMA-Factory framework.

Key Capabilities & Features

  • Domain-Specific Optimization: Achieves strong performance in Mathematics, Coding, and Science through a unique training approach involving supervised fine-tuning (SFT) on specialized datasets and subsequent model merging.
  • Competitive Reasoning Performance: Outperforms the 70B parameter Deepseek-R1-Distill-Llama-70B in mathematics (78.1 on AIME 2024) and shows competitive results in coding (61.6 on LiveCodeBench) and science (65.0 on GPQA-Diamond).
  • Training Data: Utilizes Chain-of-Thought (CoT) trajectories from datasets like OpenR1-Math-220k (math), OpenThoughts-114k (coding & science), and simplescaling/data_ablation_full59K (science).
  • Open-Sourced Resources: The training dataset and the full training and evaluation pipeline are open-sourced, along with a technical report available on arXiv.

Good For

  • Complex Reasoning Tasks: Ideal for applications requiring advanced problem-solving in mathematics, code generation, and scientific inquiry.
  • Research and Development: Serves as an experimental research model for advancing AI reasoning capabilities, particularly for those interested in its unique distillation and merging methodology.
  • Benchmarking: Useful for evaluating and comparing reasoning performance against other models in specialized domains.