Naphula-Archives/DeepWater-Pleroma-12B-v0-raw-weights
Naphula-Archives/DeepWater-Pleroma-12B-v0-raw-weights is an experimental 12 billion parameter Mistral Nemo model merge, created by Naphula-Archives, combining 90 different Mistral Nemo models. This version is explicitly noted as a broken prototype, uploaded for archival purposes, and suffers from issues like generating repetitive text. It includes a Python script for attempting to heal inf/nan values by replacing them with base model vectors, but the merge itself is considered a failure by its creator.
Loading preview...
DeepWater-Pleroma-12B-v0-raw-weights: A Failed Experimental Merge
This model, developed by Naphula-Archives, represents an experimental merge of 90 different Mistral Nemo models into a single 12 billion parameter model. It is explicitly labeled as a broken prototype and has been uploaded primarily for archival and research purposes.
Key Characteristics & Issues
- 90-model merge: An ambitious attempt to combine a large number of Mistral Nemo variants.
- Broken prototype: The model is known to be bugged, specifically generating endless repetition midway through text generation, regardless of the chat template used.
- Healing script included: The repository provides a Python script designed to address
inf/nanvalues within the model's weights by replacing them with corresponding tensors from a base model. This script allows the model to be quantized, but does not resolve the core generation issues. - Karcher merge method: The merge utilized the Karcher method, with a base model of
SicariusSicariiStuff/Sweet_Dreams-12B.
Why this model is notable (despite being broken)
While not suitable for direct use, this model serves as a valuable case study in complex model merging challenges. It highlights difficulties in combining numerous models, even within the same architecture family, and the persistent issues that can arise, such as numerical instability and repetitive output. The included healing script offers insight into methods for addressing specific types of corruption in merged models.
Should you use this model?
No, this model is not recommended for general use. Its creator explicitly states it is broken and recommends using a polished version (which is not yet available). It is primarily useful for researchers interested in model merging failures, debugging techniques, or as an archival record of an experimental process.