bespokelabs/Bespoke-Stratos-32B

Hugging Face
TEXT GENERATIONConcurrency Cost:2Model Size:32.8BQuant:FP8Ctx Length:32kPublished:Jan 22, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

Bespoke-Stratos-32B is a 32.8 billion parameter language model developed by bespokelabs, fine-tuned from Qwen/Qwen2.5-32B-Instruct. It is specifically optimized for advanced reasoning tasks, demonstrating improved performance on benchmarks like AIME2024 and MATH500. This model is designed for applications requiring strong analytical and problem-solving capabilities.

Loading preview...

Bespoke-Stratos-32B Overview

Bespoke-Stratos-32B is a 32.8 billion parameter language model developed by bespokelabs, built upon the Qwen/Qwen2.5-32B-Instruct architecture. This model has undergone fine-tuning using the proprietary Bespoke-Stratos-17k dataset, which is derived from DeepSeek-R1 via a modified Berkeley NovaSky’s Sky-T1 data pipeline.

Key Capabilities & Performance

Bespoke-Stratos-32B demonstrates enhanced performance in complex reasoning benchmarks compared to its base model, Qwen-2.5-32B-Instruct, and other models like Sky-T1-32B. Notable benchmark results include:

  • AIME2024: 63.3
  • MATH500: 93.0
  • GPQA-Diamond: 58.1
  • LCB v2 All: 71.1

These scores indicate a strong aptitude for mathematical reasoning, advanced problem-solving, and general language comprehension tasks.

Training Details

The model was trained for 27 hours using 8xH100 GPUs. Key hyperparameters included a learning rate of 1e-05, a total training batch size of 96, and 3 epochs, utilizing an AdamW optimizer with a cosine learning rate scheduler.

Intended Uses

This model is particularly well-suited for applications requiring robust reasoning and analytical capabilities, such as advanced question answering, complex problem-solving, and tasks demanding high accuracy in logical deduction. It is released under an Apache 2.0 License.