Overview
andysalerno/mistral-sft-v3 is a 7 billion parameter model derived from the Mistral-7B-v0.1 architecture. Its primary distinction lies in its light fine-tuning with ChatML special tokens using the andysalerno/ansalern-nectar-inputoutput dataset. This process ensures the model correctly understands and formats output according to the ChatML specification.
Key Characteristics
- Base Model: Mistral-7B-v0.1
- Parameter Count: 7 billion
- Context Length: 4096 tokens
- ChatML Integration: Specifically trained to follow ChatML formatting, making it suitable for subsequent fine-tuning for chat applications.
Performance Benchmarks
Evaluations on the Open LLM Leaderboard show an average score of 60.93. Notable scores include:
- HellaSwag (10-Shot): 82.23
- MMLU (5-Shot): 63.40
- AI2 Reasoning Challenge (25-Shot): 61.35
Intended Use
This model is not designed as a direct chat model for end-user interaction. Instead, its core purpose is to serve as a robust base for developers to fine-tune their own models that require adherence to ChatML formatting. This makes it ideal for projects where custom chat behaviors or specific instruction-following capabilities are needed on top of a ChatML-compliant foundation.