Corianas/Neural-Mistral-7B
Corianas/Neural-Mistral-7B is a 7 billion parameter language model developed by Corianas, fine-tuned from Mistral-7B-Instruct-v0.2 using Direct Preference Optimization (DPO). This model is designed for instruction-following tasks, leveraging Grouped-Query Attention and Sliding-Window Attention for efficient processing. It is optimized for conversational AI and general-purpose text generation, building upon the robust Mistral architecture.
Loading preview...
Corianas/Neural-Mistral-7B: DPO Fine-tune of Mistral-7B-Instruct-v0.2
Corianas/Neural-Mistral-7B is a 7 billion parameter instruction-tuned model developed by Corianas, building upon the mistralai/Mistral-7B-Instruct-v0.2 base model. This version has been fine-tuned using Direct Preference Optimization (DPO), a method detailed in a Towards Data Science article, to enhance its instruction-following capabilities.
Key Capabilities & Features
- Instruction Following: Optimized for generating responses that adhere to user instructions, leveraging the DPO fine-tuning approach.
- Mistral Architecture: Inherits the efficient architecture of Mistral-7B-v0.1, including:
- Grouped-Query Attention: Improves inference speed and reduces memory usage.
- Sliding-Window Attention: Enables handling longer sequences more efficiently.
- Byte-fallback BPE tokenizer: Provides robust tokenization.
- Chat Template Support: Designed to work seamlessly with the standard Mistral instruction format, using
[INST]and[/INST]tokens, and is compatible with Hugging Face'sapply_chat_template()method.
Training Details
The model was trained using the Intel/orca_dpo_pairs dataset. The training procedure involved specific hyperparameters such as a learning rate of 5e-5, paged_adamw_32bit optimizer, and bf16 precision, over 200 steps. This DPO fine-tuning aims to align the model's outputs more closely with human preferences.
Good for
- Conversational AI: Excels in generating coherent and contextually relevant responses in chat-based interactions.
- General Instruction-Following: Suitable for a wide range of tasks requiring precise adherence to prompts.
- Research and Development: Provides a DPO-tuned Mistral variant for further experimentation and application development.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.