amazingvince/openhermes-7b-dpo
amazingvince/openhermes-7b-dpo is an experimental 7 billion parameter DPO-tuned Mistral-based language model with a 4096-token context length. It is a continuation of the OpenHermes 2 model, further fine-tuned with additional code datasets. This model is notable for demonstrating improved performance on non-code benchmarks like TruthfulQA, AGIEval, and GPT4All suite due to its code instruction training, making it suitable for a range of general language understanding and generation tasks.
Loading preview...
OpenHermes 2.5 Mistral 7B DPO Tune
This model, amazingvince/openhermes-7b-dpo, is an experimental DPO (Direct Preference Optimization) fine-tune based on the Mistral 7B architecture. It builds upon the OpenHermes 2.5 model, which itself is an evolution of OpenHermes 2, incorporating additional code datasets during its training.
Key Capabilities
- Enhanced General Reasoning: Training with a significant ratio of code instruction data (estimated 7-14% of the total dataset) has shown to boost performance on several non-code benchmarks, including TruthfulQA, AGIEval, and the GPT4All suite.
- Code-Awareness: Benefits from additional code datasets, which contributes to its overall improved abilities.
- DPO Optimization: Utilizes Direct Preference Optimization with various datasets to refine its responses and capabilities.
Good for
- General language understanding and generation tasks where improved reasoning is beneficial.
- Applications requiring a balance of code and non-code related intelligence.
- Experimentation with DPO-tuned models for diverse NLP challenges.