formalmathatepfl/apertus-cpt-sft-classic

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:May 25, 2026Architecture:Transformer Cold

The formalmathatepfl/apertus-cpt-sft-classic is an 8 billion parameter causal language model, fine-tuned from the swiss-ai/Apertus-8B-2509 base model. It was trained on the sft dataset with a 32768 token context length. This model is a supervised fine-tuned version, demonstrating a final validation loss of 0.0877, indicating its performance on the specific fine-tuning task.

Loading preview...

Model Overview

The formalmathatepfl/apertus-cpt-sft-classic is an 8 billion parameter language model, derived from the swiss-ai/Apertus-8B-2509 base model. It has undergone supervised fine-tuning (SFT) on a specific dataset, achieving a validation loss of 0.0877. The training utilized a cosine learning rate scheduler with a warmup ratio of 0.05 over 1 epoch, employing an AdamW optimizer.

Key Training Details

  • Base Model: swiss-ai/Apertus-8B-2509
  • Fine-tuning Method: Supervised Fine-Tuning (SFT)
  • Parameters: 8 Billion
  • Context Length: 32768 tokens
  • Final Validation Loss: 0.0877
  • Optimizer: AdamW_TORCH with betas=(0.9, 0.999) and epsilon=1e-08
  • Learning Rate: 1e-05
  • Epochs: 1.0

Intended Uses & Limitations

As a supervised fine-tuned model, its primary utility lies in tasks aligned with its training data. Specific intended uses and limitations are not detailed in the provided information, suggesting further evaluation or documentation is needed for optimal application guidance. Users should consider the fine-tuning objective when deploying this model.