mlfoundations-dev/llama3-1_8b_4o_annotated_aops
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kLicense:apache-2.0Architecture:Transformer Open Weights Warm

The mlfoundations-dev/llama3-1_8b_4o_annotated_aops model is a 7.6 billion parameter language model fine-tuned from Qwen/Qwen2.5-7B-Instruct. It was specifically trained on the mlfoundations-dev/4o_annotated_aops dataset, indicating an optimization for tasks related to annotated mathematical problems or similar structured data. This model is designed for specialized applications leveraging its fine-tuning on a domain-specific dataset.

Loading preview...

Model Overview

This model, llama3-1_8b_4o_annotated_aops, is a fine-tuned variant of the Qwen/Qwen2.5-7B-Instruct architecture. It features approximately 7.6 billion parameters and supports a context length of 131,072 tokens. The primary differentiation of this model stems from its specialized training on the mlfoundations-dev/4o_annotated_aops dataset.

Key Characteristics

  • Base Model: Fine-tuned from Qwen/Qwen2.5-7B-Instruct.
  • Parameter Count: 7.6 billion parameters.
  • Context Length: Supports an extensive context window of 131,072 tokens.
  • Specialized Training: Underwent fine-tuning on the mlfoundations-dev/4o_annotated_aops dataset, suggesting an emphasis on tasks related to annotated mathematical problems or structured data analysis.

Training Details

Training was conducted using a learning rate of 1e-05, a total batch size of 96, and a cosine learning rate scheduler with a 0.1 warmup ratio over 3 epochs. The optimizer used was AdamW with standard betas and epsilon. The training utilized a multi-GPU setup with 32 devices.

Potential Use Cases

Given its fine-tuning on a specific annotated dataset, this model is likely best suited for:

  • Tasks requiring understanding or generation based on structured, annotated data.
  • Applications in domains similar to the 4o_annotated_aops dataset, potentially involving mathematical reasoning or problem-solving with explicit annotations.

Users should consider the specific nature of the 4o_annotated_aops dataset to determine suitability for their particular use case, as the model's performance will be optimized for that domain.