MetaStoneTec/MetaStone-L1-7B
MetaStone-L1-7B is a 7.6 billion parameter language model developed by MetaStoneTec, based on DeepSeek-R1-Distill-Qwen-7B. This model is specifically designed as a lite reasoning model within the MetaStone series, excelling in hard downstream tasks. It achieves state-of-the-art results on core reasoning benchmarks, including mathematics and code, comparable to larger API models like Claude-3.5-Sonnet-1022 and GPT4o-0513.
Loading preview...
Overview
MetaStone-L1-7B is a 7.6 billion parameter model from MetaStoneTec, serving as the "lite" reasoning model in the MetaStone series. It is built upon the DeepSeek-R1-Distill-Qwen-7B architecture and is specifically engineered to enhance performance in complex reasoning tasks.
Key Capabilities & Performance
- Advanced Reasoning: Achieves state-of-the-art results among parallel-level models on core reasoning benchmarks, particularly in mathematics and code.
- Competitive Performance: Demonstrates performance comparable to larger, proprietary API models such as Claude-3.5-Sonnet-1022 and GPT4o-0513.
- Optimized for Hard Tasks: Designed to excel in challenging downstream tasks requiring robust reasoning abilities.
Usage Guidelines
To maximize performance, MetaStoneTec recommends specific settings and prompt formatting:
- Thoughtful Output: Ensure the model input is formatted as
<|User|> [your prompt] <|Assistant|><think>to encourage detailed reasoning. - Generation Parameters: Use a temperature of 0.6, a top sampling probability of 0.95, and a maximum generation length of 32,768 tokens.
- Standardized Output: For benchmarking, use hints like "Please reason step by step, and put your final answer within \boxed{}" for math, and specific code formatting instructions for programming problems.
Ideal Use Cases
- Applications requiring strong mathematical problem-solving.
- Code generation and understanding tasks.
- Scenarios where a smaller model needs to perform complex reasoning comparable to larger, more resource-intensive alternatives.