mlfoundations-dev/openthoughts3_1k_llama3

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kLicense:llama3.1Architecture:Transformer Cold

The mlfoundations-dev/openthoughts3_1k_llama3 is an 8 billion parameter language model, fine-tuned from Meta's Llama-3.1-8B-Instruct architecture. This model was trained on the mlfoundations-dev/openthoughts3_1k dataset, suggesting a specialization derived from its specific fine-tuning data. With a 32768 token context length, it is designed for tasks benefiting from extensive contextual understanding. Its primary utility lies in applications requiring a Llama-3.1-8B-Instruct base model adapted to the characteristics of the openthoughts3_1k dataset.

Loading preview...

Overview

This model, openthoughts3_1k_llama3, is an 8 billion parameter language model derived from Meta's Llama-3.1-8B-Instruct architecture. It has been fine-tuned specifically on the mlfoundations-dev/openthoughts3_1k dataset, indicating a specialized adaptation to the characteristics and patterns present within that particular data. The model supports a substantial context length of 32768 tokens, allowing for processing and generating longer sequences of text.

Training Details

The fine-tuning process involved a learning rate of 2e-05, a cosine learning rate scheduler with a 0.1 warmup ratio, and 7.0 epochs. Training was conducted across 16 GPUs with a total batch size of 96 (achieved with 6 gradient accumulation steps). The optimizer used was ADAMW_TORCH.

Key Characteristics

  • Base Model: Meta Llama-3.1-8B-Instruct
  • Parameter Count: 8 Billion
  • Context Length: 32768 tokens
  • Fine-tuning Dataset: mlfoundations-dev/openthoughts3_1k

Potential Use Cases

This model is suitable for applications that can leverage the Llama-3.1-8B-Instruct base model's capabilities, further enhanced by the specific fine-tuning on the openthoughts3_1k dataset. Its large context window makes it potentially useful for tasks requiring deep contextual understanding over extended inputs.