NovaSky-AI/Sky-T1-32B-Flash

Hugging Face
TEXT GENERATIONConcurrency Cost:2Model Size:32.8BQuant:FP8Ctx Length:32kPublished:Jan 23, 2025License:apache-2.0Architecture:Transformer0.1K Open Weights Warm

NovaSky-AI/Sky-T1-32B-Flash is a 32.8 billion parameter reasoning model developed by the NovaSky Team at Sky Computing Lab, UC Berkeley. It is preference-optimized to significantly reduce generation lengths while maintaining accuracy, achieving up to a 57% reduction in output length compared to its preview version. This model excels in math and coding tasks, offering performance on par with other leading models but with more concise outputs.

Loading preview...

Model Overview

NovaSky-AI/Sky-T1-32B-Flash is a 32.8 billion parameter reasoning model developed by the NovaSky Team at Sky Computing Lab, UC Berkeley. It is an optimized version of Sky-T1-32B-Preview, specifically engineered to reduce the length of generated responses without compromising accuracy, particularly in math and coding domains. This optimization results in up to a 57% reduction in generation lengths on hard coding tasks.

Key Capabilities & Optimizations

  • Concise Reasoning: Significantly reduces output length (e.g., 33% on Math500, 57% on LCB Hard) compared to its predecessor, Sky-T1-32B-Preview, while maintaining comparable accuracy.
  • Strong Performance: Achieves performance in math and coding tasks on par with models like o1-preview, as demonstrated by evaluations on benchmarks such as Math500, AIME24, and LCB (Easy, Medium, Hard).
  • Preference Optimization: Trained using 10K preference pairs in math and coding domains from Sky-T1-32B-Preview, employing Simple Policy Optimization (SimPO).

When to Use This Model

This model is ideal for applications where efficient and concise reasoning outputs are critical, especially in:

  • Mathematical Problem Solving: For tasks requiring accurate yet shorter explanations or solutions.
  • Code Generation and Analysis: When developers need precise code-related outputs without excessive verbosity.
  • Resource-Constrained Environments: Its reduced output length can lead to lower inference costs and faster processing, making it suitable for scenarios where token usage is a concern.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p