formalmathatepfl/Apertus-8B-finetuned

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Mar 16, 2026License:otherArchitecture:Transformer Cold

formalmathatepfl/Apertus-8B-finetuned is an 8 billion parameter language model, fine-tuned from the Apertus_CPT_eval architecture. This model was trained on the finetuning_data dataset. Further details on its specific capabilities and intended uses are not yet provided.

Loading preview...

Model Overview

formalmathatepfl/Apertus-8B-finetuned is an 8 billion parameter language model, derived from the Apertus_CPT_eval base model. It has been fine-tuned using a specific finetuning_data dataset.

Training Details

The model underwent a single epoch of training with a learning rate of 0.0001. Key training hyperparameters included a train_batch_size of 2, eval_batch_size of 8, and a gradient_accumulation_steps of 2, resulting in an effective total_train_batch_size of 32. The AdamW optimizer was utilized with default betas and epsilon, and a cosine learning rate scheduler with a 0.03 warmup ratio. The training was conducted across 8 GPUs.

Current Status

As of now, detailed information regarding the model's specific capabilities, intended uses, limitations, and evaluation results is not yet available in the provided documentation. Users are encouraged to consult future updates for more comprehensive insights into its performance and applications.