Name: Naphula-Archives/DeepWater-Pleroma-12B-v0-raw-weights API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Naphula-Archives

DeepWater-Pleroma-12B-v0-raw-weights: A Failed Experimental Merge

This model, developed by Naphula-Archives, represents an experimental merge of 90 different Mistral Nemo models into a single 12 billion parameter model. It is explicitly labeled as a broken prototype and has been uploaded primarily for archival and research purposes.

Key Characteristics & Issues

90-model merge: An ambitious attempt to combine a large number of Mistral Nemo variants.
Broken prototype: The model is known to be bugged, specifically generating endless repetition midway through text generation, regardless of the chat template used.
Healing script included: The repository provides a Python script designed to address inf/nan values within the model's weights by replacing them with corresponding tensors from a base model. This script allows the model to be quantized, but does not resolve the core generation issues.
Karcher merge method: The merge utilized the Karcher method, with a base model of SicariusSicariiStuff/Sweet_Dreams-12B.

Why this model is notable (despite being broken)

While not suitable for direct use, this model serves as a valuable case study in complex model merging challenges. It highlights difficulties in combining numerous models, even within the same architecture family, and the persistent issues that can arise, such as numerical instability and repetitive output. The included healing script offers insight into methods for addressing specific types of corruption in merged models.

Should you use this model?

No, this model is not recommended for general use. Its creator explicitly states it is broken and recommends using a polished version (which is not yet available). It is primarily useful for researchers interested in model merging failures, debugging techniques, or as an archival record of an experimental process.

Overview

DeepWater-Pleroma-12B-v0-raw-weights: A Failed Experimental Merge

Key Characteristics & Issues

Why this model is notable (despite being broken)

Should you use this model?

Full Model Card (README)