yihang7/zephyr-7b-sft-full

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Dec 15, 2023License:apache-2.0Architecture:Transformer Open Weights Cold

The yihang7/zephyr-7b-sft-full model is a 7 billion parameter language model fine-tuned from mistralai/Mistral-7B-v0.1. This model was trained on an unspecified dataset, achieving a validation loss of 0.9585. Its specific primary differentiators and intended use cases are not detailed in the available information.

Loading preview...

Model Overview

The yihang7/zephyr-7b-sft-full is a 7 billion parameter language model derived from the mistralai/Mistral-7B-v0.1 architecture. It has undergone supervised fine-tuning (SFT) on an undisclosed dataset.

Training Details

This model was trained using the following key hyperparameters:

  • Learning Rate: 2e-05
  • Batch Size: 32 (train), 16 (eval)
  • Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
  • Epochs: 1
  • Total Train Batch Size: 512 (with gradient accumulation steps of 2)

During training, the model achieved a validation loss of 0.9585.

Limitations and Use Cases

The available documentation does not specify the intended uses, limitations, or the nature of the dataset used for fine-tuning. Therefore, its specific strengths, ideal applications, or potential weaknesses are not detailed.