sedrickkeh/mistral_openhermes_v3

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Oct 25, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

The sedrickkeh/mistral_openhermes_v3 is a 7 billion parameter language model, fine-tuned from mistralai/Mistral-7B-v0.1. This model was trained with a 4096 token context length and achieved a final loss of 0.5579 on its evaluation set. While specific differentiators are not detailed, its fine-tuning process suggests potential for specialized applications based on its unknown training dataset.

Loading preview...

Model Overview

The sedrickkeh/mistral_openhermes_v3 is a 7 billion parameter language model, fine-tuned from the base mistralai/Mistral-7B-v0.1 architecture. It was trained with a learning rate of 5e-06 over 3 epochs, utilizing a total batch size of 1024 across 8 GPUs. The model achieved a final validation loss of 0.5579.

Training Details

  • Base Model: mistralai/Mistral-7B-v0.1
  • Parameters: 7 billion
  • Context Length: 4096 tokens (inferred from base model)
  • Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
  • Learning Rate Scheduler: Constant with 0.1 warmup ratio
  • Epochs: 3.0
  • Final Validation Loss: 0.5579

Limitations

Detailed information regarding the specific training dataset, intended uses, and limitations is not provided in the model card. Users should exercise caution and conduct further evaluation to determine its suitability for specific applications.