HenryJJ/dolphin-2.6-mistral-7b-dpo-orca
HenryJJ/dolphin-2.6-mistral-7b-dpo-orca is a 7 billion parameter auto-regressive language model, fine-tuned by HenryJJ using DPO (Direct Preference Optimization) from cognitivecomputations/dolphin-2.6-mistral-7b. It leverages the Mistral 7B architecture and was trained on the Intel/orca_dpo_pairs dataset for 1200 steps with a 1024 token context window. This model is designed for English language tasks, focusing on instruction following and chat applications, utilizing the ChatML prompt format.
Loading preview...
Model Overview
HenryJJ/dolphin-2.6-mistral-7b-dpo-orca is a 7 billion parameter English language model, fine-tuned by HenryJJ. It is based on the Mistral 7B architecture and was developed using Direct Preference Optimization (DPO) from the cognitivecomputations/dolphin-2.6-mistral-7b base model. The training process involved 1200 steps on the Intel/orca_dpo_pairs dataset, utilizing a 1024 token context window.
Key Characteristics
- Architecture: Llama 2 transformer architecture, specifically Mistral 7B.
- Training Method: DPO (Direct Preference Optimization) for enhanced instruction following.
- Dataset: Trained on
Intel/orca_dpo_pairs. - Context Window: Supports a 1024 token context during training.
- Prompt Format: Employs the ChatML format, with
<|im_end|>mapping to token_id 2, ensuring compatibility with applications expecting EOS to be token_id 2.
Intended Use Cases
This model is primarily suited for chat-based applications and tasks requiring robust instruction following in English. Its DPO training aims to improve response quality and adherence to user prompts, making it suitable for conversational AI and assistant roles.