bespokelabs/Bespoke-Stratos-7B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Jan 22, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

Bespoke-Stratos-7B by bespokelabs is a 7.6 billion parameter language model, fine-tuned from Qwen2.5-7B-Instruct. It is specifically optimized for mathematical reasoning and complex problem-solving, demonstrating improved performance on benchmarks like AIME2024 and MATH500 compared to its base model. This model is designed for applications requiring advanced analytical capabilities, particularly in quantitative domains.

Loading preview...

Model Overview

Bespoke-Stratos-7B is a 7.6 billion parameter language model developed by bespokelabs, fine-tuned from the Qwen2.5-7B-Instruct architecture. Its primary differentiation lies in its enhanced mathematical reasoning and problem-solving capabilities, achieved through fine-tuning on the Bespoke-Stratos-17k dataset. This dataset was derived by distilling DeepSeek-R1 using a modified data pipeline from Berkeley NovaSky’s Sky-T1.

Key Capabilities & Performance

The model shows significant improvements over its base model, Qwen2.5-7B-Instruct, across several challenging benchmarks:

  • AIME2024: Achieves 20.0, doubling the performance of Qwen2.5-7B-Instruct (10.0).
  • MATH500: Scores 82.0, outperforming Qwen2.5-7B-Instruct (74.2).
  • GPQA-Diamond: Reaches 37.8, an improvement over Qwen2.5-7B-Instruct (33.3).
  • LiveCodeBench v2 (All): Scores 36.1, surpassing Qwen2.5-7B-Instruct (31.9).

Training Details

Bespoke-Stratos-7B was trained for 7 hours using 8xH100 GPUs. Key hyperparameters included a learning rate of 1e-05, a total train batch size of 96, and 3 epochs, utilizing a cosine learning rate scheduler with a 0.1 warmup ratio. The model is released under the Apache 2.0 License.