bespokelabs/Bespoke-Stratos-7B

Warm
Public
7.6B
FP8
131072
Jan 22, 2025
License: apache-2.0
Hugging Face
Overview

Model Overview

Bespoke-Stratos-7B is a 7.6 billion parameter language model developed by bespokelabs, fine-tuned from the Qwen2.5-7B-Instruct architecture. Its primary differentiation lies in its enhanced mathematical reasoning and problem-solving capabilities, achieved through fine-tuning on the Bespoke-Stratos-17k dataset. This dataset was derived by distilling DeepSeek-R1 using a modified data pipeline from Berkeley NovaSky’s Sky-T1.

Key Capabilities & Performance

The model shows significant improvements over its base model, Qwen2.5-7B-Instruct, across several challenging benchmarks:

  • AIME2024: Achieves 20.0, doubling the performance of Qwen2.5-7B-Instruct (10.0).
  • MATH500: Scores 82.0, outperforming Qwen2.5-7B-Instruct (74.2).
  • GPQA-Diamond: Reaches 37.8, an improvement over Qwen2.5-7B-Instruct (33.3).
  • LiveCodeBench v2 (All): Scores 36.1, surpassing Qwen2.5-7B-Instruct (31.9).

Training Details

Bespoke-Stratos-7B was trained for 7 hours using 8xH100 GPUs. Key hyperparameters included a learning rate of 1e-05, a total train batch size of 96, and 3 epochs, utilizing a cosine learning rate scheduler with a 0.1 warmup ratio. The model is released under the Apache 2.0 License.