eren23/ogno-monarch-jaskier-merge-7b-OH-PREF-DPO-v2
The eren23/ogno-monarch-jaskier-merge-7b-OH-PREF-DPO-v2 is a 7 billion parameter language model, fine-tuned using DPO on the OpenHermesPreferences dataset. This model is an iteration of a previous merge, further optimized for preference alignment. With a context length of 8192 tokens, it aims to improve conversational quality and adherence to user preferences, though it is noted as not fully tested and should be used with caution.
Loading preview...
Model Overview
The eren23/ogno-monarch-jaskier-merge-7b-OH-PREF-DPO-v2 is a 7 billion parameter language model that has undergone further Direct Preference Optimization (DPO) fine-tuning. This iteration builds upon a previous merge, specifically utilizing the argilla/OpenHermesPreferences dataset for its DPO training.
Key Characteristics
- Parameter Count: 7 billion parameters.
- Context Length: Supports an 8192-token context window.
- Fine-tuning: Enhanced with DPO using the OpenHermesPreferences dataset, suggesting an aim for improved preference alignment and conversational quality.
- Development Status: The model is noted as not yet fully tested, and users are advised to proceed with caution.
Performance Metrics
Based on the Open LLM Leaderboard evaluations, the model achieves an average score of 76.44 across various benchmarks:
- AI2 Reasoning Challenge (25-Shot): 73.12
- HellaSwag (10-Shot): 89.07
- MMLU (5-Shot): 64.80
- TruthfulQA (0-shot): 77.46
- Winogrande (5-shot): 84.69
- GSM8k (5-shot): 69.52
Usage Considerations
Given its DPO fine-tuning on preference data, this model is potentially suitable for tasks requiring nuanced conversational responses or alignment with specific user preferences. However, due to its 'not fully tested' status, it is recommended for experimental use or scenarios where thorough validation can be performed.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.