Name: mlfoundations-dev/llama3-1_8b_4o_annotated_aops API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: mlfoundations-dev

Model Overview

This model, llama3-1_8b_4o_annotated_aops, is a fine-tuned variant of the Qwen/Qwen2.5-7B-Instruct architecture. It features approximately 7.6 billion parameters and supports a context length of 131,072 tokens. The primary differentiation of this model stems from its specialized training on the mlfoundations-dev/4o_annotated_aops dataset.

Key Characteristics

Base Model: Fine-tuned from Qwen/Qwen2.5-7B-Instruct.
Parameter Count: 7.6 billion parameters.
Context Length: Supports an extensive context window of 131,072 tokens.
Specialized Training: Underwent fine-tuning on the mlfoundations-dev/4o_annotated_aops dataset, suggesting an emphasis on tasks related to annotated mathematical problems or structured data analysis.

Training Details

Training was conducted using a learning rate of 1e-05, a total batch size of 96, and a cosine learning rate scheduler with a 0.1 warmup ratio over 3 epochs. The optimizer used was AdamW with standard betas and epsilon. The training utilized a multi-GPU setup with 32 devices.

Potential Use Cases

Given its fine-tuning on a specific annotated dataset, this model is likely best suited for:

Tasks requiring understanding or generation based on structured, annotated data.
Applications in domains similar to the 4o_annotated_aops dataset, potentially involving mathematical reasoning or problem-solving with explicit annotations.

Users should consider the specific nature of the 4o_annotated_aops dataset to determine suitability for their particular use case, as the model's performance will be optimized for that domain.

Overview

Model Overview

Key Characteristics

Training Details

Potential Use Cases

Full Model Card (README)