Model Overview
chargoddard/piano-medley-7b is a 7 billion parameter language model developed by chargoddard, based on the mistralai/Mistral-7B-v0.1 architecture. This model represents an experimental approach combining multiple fine-tuned checkpoints through a TIES (Trimming, Iterative Merging, and Self-Distillation) merge method.
Key Development Steps
The model's creation involved several stages, building upon previous experiments like loyal-piano-m7:
- Initial Training:
loyal-piano-m7 was trained. - cDPO Fine-tuning:
loyal-piano-m7 underwent Conditional DPO (cDPO) using the HuggingFaceH4/ultrafeedback_binarized dataset, resulting in loyal-piano-m7-cdpo. - Parallel Training: Another model,
servile-harpsichord, was trained with different sampling from the same source datasets as loyal-piano. - cDPO on
servile-harpsichord: servile-harpsichord was then fine-tuned with cDPO using allenai/ultrafeedback_binarized_cleaned, Intel/orca_dpo_pairs, and a helpfulness-only version of PKU-Alignment/PKU-SafeRLHF. - TIES Merge: The final
piano-medley-7b model was created by performing a TIES merge of several checkpoints from servile-harpsichord-cdpo with loyal-piano-m7-cdpo.
Performance and Usage
Local benchmarks indicate that the merged piano-medley-7b model outperforms its individual constituent components. It is instruction-tuned to respond to the Alpaca prompt format, making it suitable for various conversational and instruction-following applications. The merge configuration utilized a density of 0.4 and int8_mask for efficiency.