MuXodious/L3-8B-Wingless-Moon-Maiden-PaperWitch-heresy
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Feb 22, 2026Architecture:Transformer0.0K Cold

MuXodious/L3-8B-Wingless-Moon-Maiden-PaperWitch-heresy is an 8 billion parameter language model created by MuXodious, merged using the DELLA method with SicariusSicariiStuff/Wingless_Imp_8B as its base. This model integrates NeverSleep/Lumimaid-v0.2-8B and Sao10K/L3-8B-Lunaris-v1, focusing on minimizing refusals and KL divergence. It is designed for applications requiring a model with specific refusal and divergence characteristics, operating with an 8192 token context length.

Loading preview...

Model Overview

MuXodious/L3-8B-Wingless-Moon-Maiden-PaperWitch-heresy is an 8 billion parameter language model developed by MuXodious. It is a merged model, utilizing the DELLA merge method with SicariusSicariiStuff/Wingless_Imp_8B as its foundational base.

Merge Details

This model was created by combining three distinct models:

  • Base Model: SicariusSicariiStuff/Wingless_Imp_8B
  • Merged Models:
    • NeverSleep/Lumimaid-v0.2-8B (with a weight of 0.5)
    • Sao10K/L3-8B-Lunaris-v1 (with a weight of 0.5)

The merge configuration also specifies a density of 0.5, normalization, an epsilon of 0.4, and a lambda of 1. The tokenizer source is set to union, and it uses the llama3 chat template.

Performance Characteristics

The model's "Heretication Results" highlight its performance in terms of refusals and KL divergence. It achieved 0/104 refusals and a KL divergence of 0.0129 in its best trial, indicating a focus on reducing model refusal rates. Initial refusal rates were significantly higher (101/104), suggesting the merging process aimed to mitigate this behavior.

Key Features

  • 8 Billion Parameters: A moderately sized model suitable for various tasks.
  • DELLA Merge Method: Leverages a specific merging technique to combine capabilities from multiple base models.
  • Optimized for Refusal Reduction: Performance metrics indicate a design goal of minimizing model refusals.
  • 8192 Token Context Length: Supports processing of relatively long sequences.

Potential Use Cases

This model could be particularly useful for applications where:

  • Minimizing model refusals to user prompts is critical.
  • Leveraging the combined strengths of the merged base models is desired.
  • A model with an 8B parameter count and 8192 token context is appropriate for the computational budget and task complexity.