Naphula/Salamander-24B-v1

Hugging Face
TEXT GENERATIONConcurrency Cost:2Model Size:24BQuant:FP8Ctx Length:32kPublished:May 2, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

Naphula/Salamander-24B-v1 is a 24 billion parameter Mistral-based language model, created by Naphula, utilizing a 'della' merge of multiple 2501, 2506, and 2509 models, with a 32768 token context length. This model is designed to avoid refusals and does not require ablation or jailbreaks, making it suitable for applications requiring unconstrained text generation. Its architecture is a complex merge of various specialized models, aiming for robust and versatile performance across general text-based tasks.

Loading preview...

Salamander-24B-v1 Overview

Naphula/Salamander-24B-v1 is a 24 billion parameter language model built on the Mistral architecture, specifically designated as "Checkpoint 82." It is a sophisticated 'della' merge, a technique described in the della paper, combining numerous base models from the 2501, 2506, and 2509 series, with additional components from the 2503 series. This intricate merging strategy aims to synthesize the strengths of its constituent models.

Key Characteristics

  • Architecture: MistralForCausalLM, leveraging a 'della' merge method.
  • Parameter Count: 24 billion parameters.
  • Context Length: Supports a context window of 32768 tokens.
  • Refusal Handling: Initial tests indicate no observed refusals, suggesting the model is designed to generate responses without requiring specific ablation or jailbreaking techniques.
  • Constituent Models: The merge incorporates a diverse set of models, including Darkhn--Magistral-2509-24B-Text-Only as a base, and contributions from ReadyArt--4.2.0-Broken-Tutu-24b, Dolphin-Mistral-24B-Venice-Edition, BeaverAI_Fallen-Mistral-Small-3.1-24B-v1e_textonly, TheDrummer--Cydonia-24B-v4.3, and MuXodious--Tiamat-24B-Magistral-PaperWitch-heresy, among others.

Intended Use Cases

This model is well-suited for general text generation tasks where a broad range of responses is desired without encountering content refusal mechanisms. Its complex merge architecture suggests a versatile capability for various language understanding and generation applications, particularly those benefiting from a model that does not require extensive prompt engineering to bypass safety filters.