mlfoundations-dev/llama3-1_8b_r1_annotated_olympiads
The mlfoundations-dev/llama3-1_8b_r1_annotated_olympiads model is a 7.6 billion parameter language model fine-tuned from Qwen/Qwen2.5-7B-Instruct. It features a 131,072 token context length and is specifically adapted using the mlfoundations-dev/r1_annotated_olympiads dataset. This model is optimized for tasks related to the specific domain covered by the Olympiads dataset, suggesting a focus on reasoning or problem-solving within that context.
Loading preview...
Model Overview
This model, llama3-1_8b_r1_annotated_olympiads, is a fine-tuned variant of the Qwen/Qwen2.5-7B-Instruct architecture, featuring approximately 7.6 billion parameters and a substantial 131,072 token context window. It has been specifically adapted using the mlfoundations-dev/r1_annotated_olympiads dataset.
Training Details
The model was trained with a learning rate of 1e-05, a total effective batch size of 96, and utilized a cosine learning rate scheduler with a 0.1 warmup ratio over 3 epochs. The training leveraged a multi-GPU setup with 32 devices and AdamW optimizer. The underlying framework versions include Transformers 4.46.1, Pytorch 2.5.1, Datasets 3.0.2, and Tokenizers 0.20.3.
Potential Use Cases
Given its fine-tuning on the r1_annotated_olympiads dataset, this model is likely specialized for tasks involving:
- Problem-solving and reasoning within the domain of Olympiad-style challenges.
- Understanding and generating responses related to complex academic or competitive problems.
Further details on specific intended uses, limitations, and comprehensive evaluation data are not provided in the current model card.