Overview
Sela223/Repose-Marlin-12B is a 12 billion parameter language model developed by Sela223, created through a sophisticated merge of two distinct base models: UsernameJustAnother/Nemo-12B-Marlin-v8 and KatyTheCutie/Repose-V2-2B. This model was constructed using the SLERP merge method via mergekit.
Merge Details
The merging process involved a precise configuration, applying varying weights across different layers and components of the merged models. Specifically, the slerp method was used with a bfloat16 dtype. The merge strategy included distinct parameter weighting for:
- Attention Blocks: Different values were applied to
q_proj, k_proj, v_proj, o_proj, and self_attn filters. - MLP Blocks: Specific weights were assigned to
gate_proj, up_proj, down_proj, and general mlp filters. - Normalization Layers:
input_layernorm, post_attention_layernorm, and other layernorm components received tailored weighting. - Stabilizer:
embed_tokens and lm_head layers were set to a value of 0.0, indicating a strong influence from the base model for these components.
Key Characteristics
- Hybrid Architecture: Combines features from two different 12B and 2B parameter models.
- Layer-wise Optimization: Utilizes a detailed parameter weighting scheme to blend capabilities across different neural network components.
- General Purpose: Intended for broad language generation and understanding tasks, benefiting from the combined strengths of its merged predecessors.