Sela223/Repose-Marlin-12B
Sela223/Repose-Marlin-12B is a 12 billion parameter language model created by Sela223, formed by merging UsernameJustAnother/Nemo-12B-Marlin-v8 and KatyTheCutie/Repose-V2-2B using the SLERP method. This model leverages a specific layer-wise parameter weighting strategy across attention, MLP, and normalization blocks to combine the strengths of its constituent models. It is designed for general language generation tasks, integrating diverse capabilities from its merged components.
Loading preview...
Overview
Sela223/Repose-Marlin-12B is a 12 billion parameter language model developed by Sela223, created through a sophisticated merge of two distinct base models: UsernameJustAnother/Nemo-12B-Marlin-v8 and KatyTheCutie/Repose-V2-2B. This model was constructed using the SLERP merge method via mergekit.
Merge Details
The merging process involved a precise configuration, applying varying weights across different layers and components of the merged models. Specifically, the slerp method was used with a bfloat16 dtype. The merge strategy included distinct parameter weighting for:
- Attention Blocks: Different values were applied to
q_proj,k_proj,v_proj,o_proj, andself_attnfilters. - MLP Blocks: Specific weights were assigned to
gate_proj,up_proj,down_proj, and generalmlpfilters. - Normalization Layers:
input_layernorm,post_attention_layernorm, and otherlayernormcomponents received tailored weighting. - Stabilizer:
embed_tokensandlm_headlayers were set to a value of 0.0, indicating a strong influence from the base model for these components.
Key Characteristics
- Hybrid Architecture: Combines features from two different 12B and 2B parameter models.
- Layer-wise Optimization: Utilizes a detailed parameter weighting scheme to blend capabilities across different neural network components.
- General Purpose: Intended for broad language generation and understanding tasks, benefiting from the combined strengths of its merged predecessors.