chargoddard/piano-medley-7b

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Dec 10, 2023License:cc-by-nc-4.0Architecture:Transformer0.0K Open Weights Cold

The chargoddard/piano-medley-7b is a 7 billion parameter language model developed by chargoddard, built upon the Mistral-7B-v0.1 architecture. This model is a TIES merge of several fine-tuned checkpoints, including loyal-piano-m7-cdpo and servile-harpsichord-cdpo, which were trained using cDPO with various binarized feedback datasets. It is instruction-tuned using the Alpaca prompt format and shows improved performance over its individual components in local benchmarks, making it suitable for general conversational AI tasks.

Loading preview...

Model Overview

chargoddard/piano-medley-7b is a 7 billion parameter language model developed by chargoddard, based on the mistralai/Mistral-7B-v0.1 architecture. This model represents an experimental approach combining multiple fine-tuned checkpoints through a TIES (Trimming, Iterative Merging, and Self-Distillation) merge method.

Key Development Steps

The model's creation involved several stages, building upon previous experiments like loyal-piano-m7:

  • Initial Training: loyal-piano-m7 was trained.
  • cDPO Fine-tuning: loyal-piano-m7 underwent Conditional DPO (cDPO) using the HuggingFaceH4/ultrafeedback_binarized dataset, resulting in loyal-piano-m7-cdpo.
  • Parallel Training: Another model, servile-harpsichord, was trained with different sampling from the same source datasets as loyal-piano.
  • cDPO on servile-harpsichord: servile-harpsichord was then fine-tuned with cDPO using allenai/ultrafeedback_binarized_cleaned, Intel/orca_dpo_pairs, and a helpfulness-only version of PKU-Alignment/PKU-SafeRLHF.
  • TIES Merge: The final piano-medley-7b model was created by performing a TIES merge of several checkpoints from servile-harpsichord-cdpo with loyal-piano-m7-cdpo.

Performance and Usage

Local benchmarks indicate that the merged piano-medley-7b model outperforms its individual constituent components. It is instruction-tuned to respond to the Alpaca prompt format, making it suitable for various conversational and instruction-following applications. The merge configuration utilized a density of 0.4 and int8_mask for efficiency.