mlfoundations-dev/s1K_32b

TEXT GENERATIONConcurrency Cost:2Model Size:32.8BQuant:FP8Ctx Length:32kLicense:apache-2.0Architecture:Transformer Open Weights Cold

The mlfoundations-dev/s1K_32b is a 32.8 billion parameter instruction-tuned causal language model, fine-tuned from Qwen/Qwen2.5-32B-Instruct. This model specializes in tasks related to the mlfoundations-dev/s1K_reformat dataset, indicating a focus on specific data formatting or transformation applications. With a substantial context length of 131072 tokens, it is designed for processing extensive inputs and maintaining long-range coherence.

Loading preview...

Overview

mlfoundations-dev/s1K_32b is a 32.8 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-32B-Instruct base model. It has been specifically adapted using the mlfoundations-dev/s1K_reformat dataset, suggesting an optimization for tasks related to data reformatting or structured data processing. The model supports a very large context window of 131072 tokens, enabling it to handle extensive input sequences and maintain context over long interactions.

Training Details

The model was trained with the following key hyperparameters:

  • Base Model: Qwen/Qwen2.5-32B-Instruct
  • Dataset: mlfoundations-dev/s1K_reformat
  • Learning Rate: 1e-05
  • Optimizer: ADAMW_TORCH with betas=(0.9, 0.95)
  • Epochs: 5.0
  • Batch Size: 1 (train), 8 (eval) across 16 devices, resulting in a total effective batch size of 16 (train) and 128 (eval).

Intended Use Cases

Given its fine-tuning on the s1K_reformat dataset, this model is likely best suited for applications involving:

  • Data Transformation: Tasks requiring reformatting or restructuring of data based on specific patterns.
  • Structured Data Processing: Handling and generating content that adheres to particular formats.
  • Long Context Understanding: Leveraging its 131072-token context length for tasks that require processing and generating very long documents or conversations.