Name: Sela223/Repose-Marlin-12B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Sela223

Overview

Sela223/Repose-Marlin-12B is a 12 billion parameter language model developed by Sela223, created through a sophisticated merge of two distinct base models: UsernameJustAnother/Nemo-12B-Marlin-v8 and KatyTheCutie/Repose-V2-2B. This model was constructed using the SLERP merge method via mergekit.

Merge Details

The merging process involved a precise configuration, applying varying weights across different layers and components of the merged models. Specifically, the slerp method was used with a bfloat16 dtype. The merge strategy included distinct parameter weighting for:

Attention Blocks: Different values were applied to q_proj, k_proj, v_proj, o_proj, and self_attn filters.
MLP Blocks: Specific weights were assigned to gate_proj, up_proj, down_proj, and general mlp filters.
Normalization Layers: input_layernorm, post_attention_layernorm, and other layernorm components received tailored weighting.
Stabilizer: embed_tokens and lm_head layers were set to a value of 0.0, indicating a strong influence from the base model for these components.

Key Characteristics

Hybrid Architecture: Combines features from two different 12B and 2B parameter models.
Layer-wise Optimization: Utilizes a detailed parameter weighting scheme to blend capabilities across different neural network components.
General Purpose: Intended for broad language generation and understanding tasks, benefiting from the combined strengths of its merged predecessors.

Overview

Overview

Merge Details

Key Characteristics

Full Model Card (README)