formalmathatepfl/Apertus-8B-finetuned
formalmathatepfl/Apertus-8B-finetuned is an 8 billion parameter language model, fine-tuned from the Apertus_CPT_eval architecture. This model was trained on the finetuning_data dataset. Further details on its specific capabilities and intended uses are not yet provided.
Loading preview...
Model Overview
formalmathatepfl/Apertus-8B-finetuned is an 8 billion parameter language model, derived from the Apertus_CPT_eval base model. It has been fine-tuned using a specific finetuning_data dataset.
Training Details
The model underwent a single epoch of training with a learning rate of 0.0001. Key training hyperparameters included a train_batch_size of 2, eval_batch_size of 8, and a gradient_accumulation_steps of 2, resulting in an effective total_train_batch_size of 32. The AdamW optimizer was utilized with default betas and epsilon, and a cosine learning rate scheduler with a 0.03 warmup ratio. The training was conducted across 8 GPUs.
Current Status
As of now, detailed information regarding the model's specific capabilities, intended uses, limitations, and evaluation results is not yet available in the provided documentation. Users are encouraged to consult future updates for more comprehensive insights into its performance and applications.