andysalerno/mistral-sft-v3

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 30, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

andysalerno/mistral-sft-v3 is a 7 billion parameter language model based on Mistral-7B-v0.1, fine-tuned by andysalerno. It incorporates ChatML special tokens and is lightly fine-tuned for correct ChatML formatting. This model is primarily intended as a foundational base for further fine-tuning of models that will utilize ChatML, rather than a direct chat model.

Loading preview...

Overview

andysalerno/mistral-sft-v3 is a 7 billion parameter model derived from the Mistral-7B-v0.1 architecture. Its primary distinction lies in its light fine-tuning with ChatML special tokens using the andysalerno/ansalern-nectar-inputoutput dataset. This process ensures the model correctly understands and formats output according to the ChatML specification.

Key Characteristics

  • Base Model: Mistral-7B-v0.1
  • Parameter Count: 7 billion
  • Context Length: 4096 tokens
  • ChatML Integration: Specifically trained to follow ChatML formatting, making it suitable for subsequent fine-tuning for chat applications.

Performance Benchmarks

Evaluations on the Open LLM Leaderboard show an average score of 60.93. Notable scores include:

  • HellaSwag (10-Shot): 82.23
  • MMLU (5-Shot): 63.40
  • AI2 Reasoning Challenge (25-Shot): 61.35

Intended Use

This model is not designed as a direct chat model for end-user interaction. Instead, its core purpose is to serve as a robust base for developers to fine-tune their own models that require adherence to ChatML formatting. This makes it ideal for projects where custom chat behaviors or specific instruction-following capabilities are needed on top of a ChatML-compliant foundation.