Model Overview
aqweteddy/mistral_tv-neural-marconroni is a 7 billion parameter language model built upon the Mistral 7B architecture. Its core innovation lies in the application of a "chat vector" method, as detailed in the paper "CHAT VECTOR: A SIMPLE APPROACH TO EQUIP LLMS WITH NEW LANGUAGE CHAT CAPABILITIES". This technique aims to efficiently imbue LLMs with new language chat capabilities and align them with human preferences, particularly for non-English languages.
Key Capabilities & Approach
- Chat Vector Methodology: The model utilizes a computationally efficient method, leveraging a "chat vector" derived by subtracting pre-trained LLaMA2 weights from LLaMA2-chat weights. This restructures the conventional training paradigm to focus on continual pretraining + chat.
- Multilingual Chat: Primarily focused on enhancing conversational abilities in non-English languages, with empirical studies conducted in Traditional Chinese, Korean, and Simplified Chinese.
- Human Preference Alignment: Emphasizes aligning LLMs with human preferences in terms of toxicity, instruction following, and multi-turn dialogue.
Performance
Evaluations on the Open LLM Leaderboard show the model achieving an average score of 71.27. Notable scores include:
- AI2 Reasoning Challenge (25-Shot): 69.20
- HellaSwag (10-Shot): 86.26
- MMLU (5-Shot): 65.07
- TruthfulQA (0-shot): 60.03
- Winogrande (5-shot): 80.90
- GSM8k (5-shot): 66.19
Use Cases
This model is particularly well-suited for applications requiring robust conversational AI in non-English languages, especially those focusing on Traditional Chinese, Korean, and Simplified Chinese. Its efficient alignment method makes it a strong candidate for developing chatbots and dialogue systems where human preference and instruction following are critical.