Overview
HenryJJ/dolphin-2.6-mistral-7b-dpo-orca-v2 is a 7 billion parameter English-language auto-regressive model, fine-tuned by HenryJJ. It is built upon the Llama 2 transformer architecture and specifically derived from cognitivecomputations/dolphin-2.6-mistral-7b. The model underwent Direct Preference Optimization (DPO) training for 1200 steps, utilizing the Intel/orca_dpo_pairs dataset, which is known for its high-quality instruction-following examples. It was trained with a context window of 1024 tokens, and supports a 4096 token context during inference.
Key Characteristics
- Architecture: Based on the Llama 2 transformer architecture.
- Training Method: Fine-tuned using Direct Preference Optimization (DPO).
- Dataset: Leveraged
Intel/orca_dpo_pairs for DPO training. - Context Window: Trained with a 1024 token context window, supporting 4096 tokens for usage.
- Prompt Format: Utilizes the ChatML prompt format, with
<|im_end|> mapping to token_id 2 for compatibility with applications expecting EOS token_id 2.
Intended Use Cases
This model is well-suited for applications requiring:
- Instruction Following: Excels at complying with user requests and providing detailed answers.
- Conversational AI: Designed to act as a helpful and compliant AI assistant.
- General Text Generation: Capable of generating coherent and contextually relevant English text based on prompts.