chargoddard/piano-medley-7b
The chargoddard/piano-medley-7b is a 7 billion parameter language model developed by chargoddard, built upon the Mistral-7B-v0.1 architecture. This model is a TIES merge of several fine-tuned checkpoints, including loyal-piano-m7-cdpo and servile-harpsichord-cdpo, which were trained using cDPO with various binarized feedback datasets. It is instruction-tuned using the Alpaca prompt format and shows improved performance over its individual components in local benchmarks, making it suitable for general conversational AI tasks.
Loading preview...
Model Overview
chargoddard/piano-medley-7b is a 7 billion parameter language model developed by chargoddard, based on the mistralai/Mistral-7B-v0.1 architecture. This model represents an experimental approach combining multiple fine-tuned checkpoints through a TIES (Trimming, Iterative Merging, and Self-Distillation) merge method.
Key Development Steps
The model's creation involved several stages, building upon previous experiments like loyal-piano-m7:
- Initial Training:
loyal-piano-m7was trained. - cDPO Fine-tuning:
loyal-piano-m7underwent Conditional DPO (cDPO) using theHuggingFaceH4/ultrafeedback_binarizeddataset, resulting inloyal-piano-m7-cdpo. - Parallel Training: Another model,
servile-harpsichord, was trained with different sampling from the same source datasets asloyal-piano. - cDPO on
servile-harpsichord:servile-harpsichordwas then fine-tuned with cDPO usingallenai/ultrafeedback_binarized_cleaned,Intel/orca_dpo_pairs, and a helpfulness-only version ofPKU-Alignment/PKU-SafeRLHF. - TIES Merge: The final
piano-medley-7bmodel was created by performing a TIES merge of several checkpoints fromservile-harpsichord-cdpowithloyal-piano-m7-cdpo.
Performance and Usage
Local benchmarks indicate that the merged piano-medley-7b model outperforms its individual constituent components. It is instruction-tuned to respond to the Alpaca prompt format, making it suitable for various conversational and instruction-following applications. The merge configuration utilized a density of 0.4 and int8_mask for efficiency.