mlfoundations-dev/top_7_ranking_stackexchange

Warm
Public
8B
FP8
32768
1
Jan 7, 2025
License: llama3.1
Hugging Face
Overview

Model Overview

This model, mlfoundations-dev/top_7_ranking_stackexchange, is a fine-tuned variant of the Meta-Llama-3.1-8B architecture. It has been specifically adapted using the mlfoundations-dev/top_7_ranking_stackexchange dataset.

Key Characteristics

  • Base Model: Meta-Llama-3.1-8B.
  • Fine-tuning Objective: Optimized for ranking tasks, likely within the StackExchange context based on its training data.
  • Performance: Achieved a final validation loss of 0.8129 during training.

Training Details

The model was trained with the following key hyperparameters:

  • Learning Rate: 5e-06
  • Batch Size: 8 (train and eval), with a total effective train batch size of 512 due to gradient accumulation.
  • Epochs: 3.0
  • Optimizer: AdamW with default betas and epsilon.

Intended Uses

This model is best suited for applications that involve ranking information or content, particularly within domains similar to StackExchange, where its specialized fine-tuning can be leveraged.