mlfoundations-dev/llama3-1_8b_r1_annotated_aops

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kLicense:llama3.1Architecture:Transformer Cold

The mlfoundations-dev/llama3-1_8b_r1_annotated_aops model is a 7.6 billion parameter language model, fine-tuned from Meta-Llama-3.1-8B. It was trained on the mlfoundations-dev/r1_annotated_aops dataset, achieving a validation loss of 0.6034. This model is specifically adapted for tasks related to its fine-tuning dataset, offering specialized performance in that domain.

Loading preview...

Model Overview

This model, llama3-1_8b_r1_annotated_aops, is a fine-tuned variant of the Meta-Llama-3.1-8B architecture, developed by mlfoundations-dev. It comprises approximately 7.6 billion parameters and was trained with a context length of 131072 tokens. The fine-tuning process utilized the mlfoundations-dev/r1_annotated_aops dataset, resulting in a final validation loss of 0.6034.

Training Details

The model underwent 3 epochs of training with a learning rate of 5e-06 and a total batch size of 512 across 32 GPUs. The optimizer used was ADAMW_TORCH with default betas and epsilon, and a constant learning rate scheduler. Key training results include a progressive reduction in validation loss from 0.6528 in epoch 1 to 0.6034 in epoch 3.

Potential Use Cases

Given its fine-tuning on the r1_annotated_aops dataset, this model is likely best suited for applications and research directly related to the characteristics and content of that specific dataset. Developers should evaluate its performance on tasks aligned with the dataset's domain.