MetaStoneTec/MetaStone-L1-7B

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Mar 12, 2025Architecture:Transformer0.0K Cold

MetaStone-L1-7B is a 7.6 billion parameter language model developed by MetaStoneTec, based on DeepSeek-R1-Distill-Qwen-7B. This model is specifically designed as a lite reasoning model within the MetaStone series, excelling in hard downstream tasks. It achieves state-of-the-art results on core reasoning benchmarks, including mathematics and code, comparable to larger API models like Claude-3.5-Sonnet-1022 and GPT4o-0513.

Loading preview...

Overview

MetaStone-L1-7B is a 7.6 billion parameter model from MetaStoneTec, serving as the "lite" reasoning model in the MetaStone series. It is built upon the DeepSeek-R1-Distill-Qwen-7B architecture and is specifically engineered to enhance performance in complex reasoning tasks.

Key Capabilities & Performance

  • Advanced Reasoning: Achieves state-of-the-art results among parallel-level models on core reasoning benchmarks, particularly in mathematics and code.
  • Competitive Performance: Demonstrates performance comparable to larger, proprietary API models such as Claude-3.5-Sonnet-1022 and GPT4o-0513.
  • Optimized for Hard Tasks: Designed to excel in challenging downstream tasks requiring robust reasoning abilities.

Usage Guidelines

To maximize performance, MetaStoneTec recommends specific settings and prompt formatting:

  • Thoughtful Output: Ensure the model input is formatted as <|User|> [your prompt] <|Assistant|><think> to encourage detailed reasoning.
  • Generation Parameters: Use a temperature of 0.6, a top sampling probability of 0.95, and a maximum generation length of 32,768 tokens.
  • Standardized Output: For benchmarking, use hints like "Please reason step by step, and put your final answer within \boxed{}" for math, and specific code formatting instructions for programming problems.

Ideal Use Cases

  • Applications requiring strong mathematical problem-solving.
  • Code generation and understanding tasks.
  • Scenarios where a smaller model needs to perform complex reasoning comparable to larger, more resource-intensive alternatives.