Sumail/Derrick16

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2.6BQuant:BF16Ctx Length:8kArchitecture:Transformer Warm

Derrick16 is a merged language model created by Sumail using the SLERP method, combining rwh/gemma2 and 0x0dad0/21gg. This model integrates the strengths of its constituent models, offering a unique blend of their pre-trained capabilities. It is designed for general language tasks, leveraging a specific layer-wise parameter configuration to optimize performance.

Loading preview...

Model Overview

Derrick16 is a merged language model developed by Sumail, created using the SLERP (Spherical Linear Interpolation) merge method. This model combines the pre-trained weights of two distinct base models: rwh/gemma2 and 0x0dad0/21gg.

Merge Details

The merge process specifically targeted layers [0, 18] from both rwh/gemma2 and 0x0dad0/21gg. A detailed YAML configuration was used to control the interpolation parameters, applying different t values for self-attention (self_attn) and multi-layer perceptron (mlp) components, as well as a general t value for other parameters. This fine-grained control over the merge process aims to synthesize the strengths of the constituent models effectively.

Key Characteristics

  • SLERP Merge Method: Utilizes a sophisticated interpolation technique for combining model weights.
  • Hybrid Architecture: Integrates features from rwh/gemma2 and 0x0dad0/21gg.
  • Configurable Parameters: Specific t values applied to different model components (self-attention, MLP) during the merge.

Potential Use Cases

This model is suitable for applications that can benefit from a blend of capabilities derived from its base models. Developers looking for a model with a unique characteristic profile, potentially offering improved generalization or specialized performance in certain areas compared to its individual components, might find Derrick16 useful.