RikkiXu/zephyr-7b-sft-full

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Apr 19, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

RikkiXu/zephyr-7b-sft-full is a 7 billion parameter language model fine-tuned from mistralai/Mistral-7B-v0.1. This model is specifically optimized through supervised fine-tuning (SFT) on a generator dataset, aiming to enhance its text generation capabilities. It is suitable for applications requiring robust and coherent text output, building upon the strong foundation of the Mistral 7B architecture.

Loading preview...

Model Overview

RikkiXu/zephyr-7b-sft-full is a 7 billion parameter language model derived from the mistralai/Mistral-7B-v0.1 base model. It has undergone supervised fine-tuning (SFT) using a specific generator dataset, which aims to refine its ability to produce high-quality text.

Training Details

The model was trained with a learning rate of 2e-05, a train_batch_size of 16, and an eval_batch_size of 8, utilizing 8 devices for a total effective batch size of 128. The training process involved 1 epoch, using an Adam optimizer with cosine learning rate scheduling and a warmup ratio of 0.1. During training, a validation loss of 0.9406 was achieved.

Framework Versions

Key frameworks used include Transformers 4.41.1, Pytorch 2.1.2+cu118, Datasets 2.16.1, and Tokenizers 0.19.1.

Intended Uses

This model is primarily intended for applications that benefit from a fine-tuned Mistral 7B variant, particularly in scenarios where enhanced text generation performance is desired due to its SFT on a generator dataset.