mlfoundations-dev/oh_scale_x.5_compute_equal

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kLicense:llama3.1Architecture:Transformer Cold

The mlfoundations-dev/oh_scale_x.5_compute_equal is an 8 billion parameter language model, fine-tuned from Meta's Llama-3.1-8B architecture. This model was trained on the mlfoundations-dev/oh-dcft-v1.3_no-curation_gpt-4o-mini_scale_0.5x dataset, achieving a validation loss of 2.4058. It is a specialized iteration of the Llama-3.1 series, focusing on specific data distribution characteristics from its training dataset.

Loading preview...

Model Overview

mlfoundations-dev/oh_scale_x.5_compute_equal is an 8 billion parameter language model derived from the meta-llama/Meta-Llama-3.1-8B base architecture. It has been fine-tuned using the mlfoundations-dev/oh-dcft-v1.3_no-curation_gpt-4o-mini_scale_0.5x dataset.

Key Characteristics

  • Base Model: Meta-Llama-3.1-8B, indicating a strong foundation in general language understanding and generation.
  • Parameter Count: 8 billion parameters, offering a balance between performance and computational efficiency.
  • Context Length: Supports a context window of 32,768 tokens, enabling processing of longer inputs and generating more coherent extended outputs.
  • Training Data: Fine-tuned on a specific dataset, suggesting potential specialization towards the characteristics and patterns present in oh-dcft-v1.3_no-curation_gpt-4o-mini_scale_0.5x.
  • Performance Metric: Achieved a validation loss of 2.4058 on its evaluation set, providing a quantitative measure of its training efficacy.

Training Details

The model underwent 25 epochs of training with a learning rate of 5e-06 and a total batch size of 512 (8 devices, 8 gradient accumulation steps). The training utilized the AdamW_TORCH optimizer with a constant learning rate scheduler.

Potential Use Cases

Given its fine-tuning on a specific dataset, this model is likely best suited for applications that align with the data distribution and tasks represented in mlfoundations-dev/oh-dcft-v1.3_no-curation_gpt-4o-mini_scale_0.5x. Developers should evaluate its performance on their specific tasks to determine suitability.