DeepWater-Pleroma-12B-v0-raw-weights: A Failed Experimental Merge
This model, developed by Naphula-Archives, represents an experimental merge of 90 different Mistral Nemo models into a single 12 billion parameter model. It is explicitly labeled as a broken prototype and has been uploaded primarily for archival and research purposes.
Key Characteristics & Issues
- 90-model merge: An ambitious attempt to combine a large number of Mistral Nemo variants.
- Broken prototype: The model is known to be bugged, specifically generating endless repetition midway through text generation, regardless of the chat template used.
- Healing script included: The repository provides a Python script designed to address
inf/nan values within the model's weights by replacing them with corresponding tensors from a base model. This script allows the model to be quantized, but does not resolve the core generation issues. - Karcher merge method: The merge utilized the Karcher method, with a base model of
SicariusSicariiStuff/Sweet_Dreams-12B.
Why this model is notable (despite being broken)
While not suitable for direct use, this model serves as a valuable case study in complex model merging challenges. It highlights difficulties in combining numerous models, even within the same architecture family, and the persistent issues that can arise, such as numerical instability and repetitive output. The included healing script offers insight into methods for addressing specific types of corruption in merged models.
Should you use this model?
No, this model is not recommended for general use. Its creator explicitly states it is broken and recommends using a polished version (which is not yet available). It is primarily useful for researchers interested in model merging failures, debugging techniques, or as an archival record of an experimental process.