mlfoundations-dev/llama3-1_8b_4o_annotated_olympiads

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kLicense:apache-2.0Architecture:Transformer Open Weights Cold

This is a 7.6 billion parameter causal language model, fine-tuned from Qwen/Qwen2.5-7B-Instruct by mlfoundations-dev. It has a context length of 131072 tokens and is specifically fine-tuned on the 4o_annotated_olympiads dataset. This model is optimized for tasks related to the Olympiads dataset, suggesting a focus on complex reasoning and problem-solving.

Loading preview...

Model Overview

This model, llama3-1_8b_4o_annotated_olympiads, is a 7.6 billion parameter language model developed by mlfoundations-dev. It is a fine-tuned variant of the Qwen/Qwen2.5-7B-Instruct architecture, specifically adapted using the mlfoundations-dev/4o_annotated_olympiads dataset. The model supports a substantial context length of 131072 tokens, enabling it to process extensive inputs for complex tasks.

Key Characteristics

  • Base Model: Fine-tuned from Qwen/Qwen2.5-7B-Instruct.
  • Parameter Count: 7.6 billion parameters.
  • Context Length: 131072 tokens.
  • Training Data: Specialized fine-tuning on the 4o_annotated_olympiads dataset, indicating a focus on tasks related to competitive problem-solving or academic challenges.

Training Details

The fine-tuning process involved a learning rate of 1e-05, a total batch size of 96 (with 3 gradient accumulation steps across 32 GPUs), and 3 epochs. The optimizer used was AdamW with standard betas and epsilon, and a cosine learning rate scheduler with a 0.1 warmup ratio. The training utilized Transformers 4.46.1, Pytorch 2.5.1, Datasets 3.0.2, and Tokenizers 0.20.3.

Intended Use Cases

Given its fine-tuning on the Olympiads dataset, this model is likely best suited for:

  • Complex Reasoning: Tasks requiring advanced logical deduction and problem-solving skills.
  • Academic Support: Applications related to competitive mathematics, science, or similar academic challenges.
  • Specialized Q&A: Answering questions that demand deep understanding and analytical capabilities, particularly within the domain of the Olympiads dataset.