mlfoundations-dev/OpenHermes-2.5-sedrick

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kLicense:llama3.1Architecture:Transformer Cold

OpenHermes-2.5-sedrick is an 8 billion parameter language model fine-tuned from meta-llama/Llama-3.1-8B. Developed by mlfoundations-dev, it leverages the teknium/OpenHermes-2.5 dataset for instruction following capabilities. This model is optimized for general-purpose conversational AI and text generation tasks, building upon the strong base of Llama 3.1.

Loading preview...

OpenHermes-2.5-sedrick Overview

OpenHermes-2.5-sedrick is an 8 billion parameter language model developed by mlfoundations-dev. It is a fine-tuned variant of the robust meta-llama/Llama-3.1-8B base model, specifically trained on the teknium/OpenHermes-2.5 dataset. This fine-tuning process aims to enhance its instruction-following and conversational abilities.

Key Training Details

The model underwent 3 epochs of training with a learning rate of 5e-06 and a total batch size of 512 across 32 GPUs. The training procedure utilized an Adam optimizer with betas=(0.9, 0.999) and a constant learning rate scheduler with a warmup ratio of 0.1. The final validation loss achieved was 0.6036.

Intended Use Cases

While specific intended uses and limitations are not detailed in the provided information, as a fine-tuned Llama 3.1 variant on an instruction dataset, OpenHermes-2.5-sedrick is generally suitable for a range of applications requiring strong language understanding and generation, including:

  • General-purpose chatbots and conversational agents
  • Text summarization and generation
  • Instruction following and question answering