Steiner-preview: A Reasoning Model
Steiner-preview is a 32.8 billion parameter model developed by Yichao 'Peak' Ji, aiming to reproduce and validate the inference-time scaling of OpenAI o1. This model is trained on synthetic data using reinforcement learning, allowing it to explore multiple reasoning paths during inference, verify outcomes, and backtrack when necessary. It is designed to traverse an implicit search tree linearly.
Key Characteristics & Capabilities
- Reasoning Focus: Emphasizes exploring and verifying reasoning paths without explicit Chain of Thought (CoT) prompting.
- Synthetic Data Training: Utilizes reinforcement learning on synthetically generated data.
- High Context Length: Supports a context length of 131072 tokens.
- Deployment Compatibility: Fully compatible with existing inference services, with vLLM recommended.
- Language Composition: Post-training data is approximately 90% English and 10% Chinese, with reasoning path augmentation primarily in English.
Limitations and Considerations
- Work-in-Progress: Currently, Steiner-preview has not replicated o1's inference-time scaling; increasing reasoning steps has not improved performance on benchmarks like MMLU-Pro and GPQA.
- Multi-turn Dialogues: Not recommended for multi-turn conversations, despite being based on Qwen2.5-32B-Instruct.
- Prompting: Custom system prompts and modifications to sampling parameters (e.g., temperature) are not recommended, as they may interfere with reasoning token formatting.
Benchmarks
On the GPQA Diamond benchmark (0-shot without CoT), Steiner-preview achieved an overall accuracy of 53.54%, with notable subdomain scores such as 100% in Condensed Matter Physics and 80% in Molecular Biology.