Overview

h2m/mhm-7b-v1.3-DPO-1 is an experimental 7 billion parameter language model, building upon the Mistral architecture. It was created by h2m through a series of merges involving seven distinct models sourced from the openllm leaderboard, utilizing the dare_ties merging technique. The model has been further fine-tuned using Direct Preference Optimization (DPO) on the Intel/orca_dpo_pairs dataset.

Key Characteristics

Base Model: Derived from the mhm-7b-v1.3 model, which itself is based on Mistral.
Fine-tuning: Enhanced with DPO using the Intel/orca_dpo_pairs dataset.
Development: Result of an experimental merging process, combining multiple models to achieve its current form.
Context Length: Supports an 8192 token context window.

Intended Use

This model is presented as an experiment, suitable for researchers and developers interested in exploring the outcomes of complex model merging and DPO fine-tuning on a Mistral-based architecture. It can be applied to general language generation and understanding tasks, with its performance characteristics best evaluated through direct experimentation.

Overview

Overview

Key Characteristics

Intended Use

Full Model Card (README)