prhegde/merge-aanaphi-phi2-orage-3b

Warm
Public
3B
BF16
2048
1
Mar 26, 2024
License: mit
Hugging Face
Overview

Overview

The prhegde/merge-aanaphi-phi2-orage-3b is a 3 billion parameter language model resulting from a strategic merge of two pre-trained models: rhysjones/phi-2-orange-v2 and mobiuslabsgmbh/aanaphi2-v0.1. This merge was performed using the SLERP (Spherical Linear Interpolation) method, a technique known for smoothly combining the parameter spaces of different models.

Merge Details

The model leverages the architectures of its base components, both of which are likely derived from the Phi-2 family, known for their efficiency and performance in smaller parameter counts. The merge process involved specific layer ranges from both source models, with a detailed configuration that adjusted parameters for self-attention and MLP layers to optimize the combined model's characteristics.

Key Characteristics

  • Parameter Count: 3 billion parameters, offering a balance between performance and computational efficiency.
  • Merge Method: Utilizes the SLERP method for a balanced integration of the source models' learned representations.
  • Source Models: Built upon rhysjones/phi-2-orange-v2 and mobiuslabsgmbh/aanaphi2-v0.1, inheriting their respective strengths.

Good For

  • Resource-constrained environments: Its 3B parameter size makes it suitable for deployment where computational resources are limited.
  • General language understanding and generation: As a merge of capable base models, it can handle a variety of text-based tasks.
  • Experimentation with merged models: Provides a practical example of a SLERP-merged model for developers interested in this technique.