mlfoundations-dev/stackoverflow_5000tasks_.25p

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kLicense:llama3.1Architecture:Transformer Warm

The mlfoundations-dev/stackoverflow_5000tasks_.25p model is an 8 billion parameter language model fine-tuned from Meta-Llama-3.1-8B. It was trained on the mlfoundations-dev/stackoverflow_5000tasks_.25p dataset, achieving a final validation loss of 0.5831. This model is specialized for tasks related to the Stack Overflow domain, leveraging its fine-tuning on relevant data.

Loading preview...

Model Overview

The mlfoundations-dev/stackoverflow_5000tasks_.25p model is an 8 billion parameter language model derived from Meta-Llama-3.1-8B. It has undergone fine-tuning on the mlfoundations-dev/stackoverflow_5000tasks_.25p dataset, demonstrating a validation loss of 0.5831 after 3 epochs of training.

Training Details

The model was trained using the following key hyperparameters:

  • Learning Rate: 5e-06
  • Batch Size: 8 (per device), resulting in a total effective batch size of 512 across 8 GPUs with gradient accumulation.
  • Optimizer: ADAMW_TORCH with default betas and epsilon.
  • Epochs: 3.0

Performance

During training, the model's loss progressively decreased:

  • Epoch 1: Validation Loss 0.6395
  • Epoch 2: Validation Loss 0.6017
  • Epoch 3: Validation Loss 0.5831

Intended Use Cases

This model is specifically fine-tuned on a Stack Overflow-related dataset, suggesting its primary utility for tasks involving technical questions, code snippets, and discussions commonly found on the Stack Overflow platform. Developers and researchers working with programming-related natural language processing tasks may find this model particularly relevant.