shengjia-toronto/ssft-32B-N6
The shengjia-toronto/ssft-32B-N6 is a 32.8 billion parameter language model developed by Sheng Jia and collaborators, designed for parallel reasoning using Global Forking Tokens (SSFT). This model excels in complex reasoning tasks, demonstrating strong performance on mathematical and general question-answering benchmarks like AIME and MATH-500. It is specifically optimized to leverage multiple parallel generations for improved accuracy through techniques like majority voting (Cons@k).
Loading preview...
Overview
The shengjia-toronto/ssft-32B-N6 is a 32.8 billion parameter language model developed by Sheng Jia and collaborators, focusing on enhancing reasoning capabilities through a novel approach called Set Supervised Fine-Tuning (SSFT) with Global Forking Tokens. This method allows the model to generate and evaluate multiple reasoning paths in parallel, significantly improving accuracy on complex tasks.
Key Capabilities
- Parallel Reasoning: Utilizes "Global Forking Tokens" (e.g.,
<think1>to<think6>) to enable the model to explore multiple reasoning trajectories simultaneously. - Enhanced Accuracy with Majority Voting: Achieves higher performance on challenging benchmarks by aggregating results from parallel generations (Cons@k), outperforming single-generation (Pass@1) results.
- Strong Performance on Reasoning Tasks: Demonstrates notable results on AIME 2024/2025, MATH-500, and GPQA-D datasets, particularly when using Cons@6 and Cons@32 metrics.
- GRPO Fine-tuned Variants: Offers GRPO (Global Reasoning Policy Optimization) fine-tuned models that simplify prompting by optimizing global forking tokens for optimal tag selection, making them easier to use without manual
<think i>management.
Good For
- Complex Mathematical Reasoning: Excels in tasks requiring multi-step logical deduction, as evidenced by its high scores on AIME and MATH-500.
- General Question Answering: Improves accuracy on difficult general knowledge and reasoning questions (GPQA-D) by leveraging parallel thought processes.
- Research in Reasoning and LLM Training: Provides a valuable checkpoint for researchers exploring advanced fine-tuning techniques for reasoning, particularly those interested in the SSFT and GRPO methodologies. The associated arXiv paper details the underlying research.