Corianas/Neural-Mistral-7B: DPO Fine-tune of Mistral-7B-Instruct-v0.2
Corianas/Neural-Mistral-7B is a 7 billion parameter instruction-tuned model developed by Corianas, building upon the mistralai/Mistral-7B-Instruct-v0.2 base model. This version has been fine-tuned using Direct Preference Optimization (DPO), a method detailed in a Towards Data Science article, to enhance its instruction-following capabilities.
Key Capabilities & Features
- Instruction Following: Optimized for generating responses that adhere to user instructions, leveraging the DPO fine-tuning approach.
- Mistral Architecture: Inherits the efficient architecture of Mistral-7B-v0.1, including:
- Grouped-Query Attention: Improves inference speed and reduces memory usage.
- Sliding-Window Attention: Enables handling longer sequences more efficiently.
- Byte-fallback BPE tokenizer: Provides robust tokenization.
- Chat Template Support: Designed to work seamlessly with the standard Mistral instruction format, using
[INST] and [/INST] tokens, and is compatible with Hugging Face's apply_chat_template() method.
Training Details
The model was trained using the Intel/orca_dpo_pairs dataset. The training procedure involved specific hyperparameters such as a learning rate of 5e-5, paged_adamw_32bit optimizer, and bf16 precision, over 200 steps. This DPO fine-tuning aims to align the model's outputs more closely with human preferences.
Good for
- Conversational AI: Excels in generating coherent and contextually relevant responses in chat-based interactions.
- General Instruction-Following: Suitable for a wide range of tasks requiring precise adherence to prompts.
- Research and Development: Provides a DPO-tuned Mistral variant for further experimentation and application development.