prhegde/merge-aanaphi-phi2-orage-3b

TEXT GENERATIONConcurrency Cost:1Model Size:3BQuant:BF16Ctx Length:2kPublished:Mar 26, 2024License:mitArchitecture:Transformer0.0K Open Weights Cold

The prhegde/merge-aanaphi-phi2-orage-3b is a 3 billion parameter language model created by prhegde through a SLERP merge of rhysjones/phi-2-orange-v2 and mobiuslabsgmbh/aanaphi2-v0.1. This model combines the strengths of its constituent Phi-2 based models, offering a compact yet capable solution for general language tasks. With a 2048 token context length, it is suitable for applications requiring efficient processing of moderate text inputs.

Loading preview...

Overview

The prhegde/merge-aanaphi-phi2-orage-3b is a 3 billion parameter language model resulting from a strategic merge of two pre-trained models: rhysjones/phi-2-orange-v2 and mobiuslabsgmbh/aanaphi2-v0.1. This merge was performed using the SLERP (Spherical Linear Interpolation) method, a technique known for smoothly combining the parameter spaces of different models.

Merge Details

The model leverages the architectures of its base components, both of which are likely derived from the Phi-2 family, known for their efficiency and performance in smaller parameter counts. The merge process involved specific layer ranges from both source models, with a detailed configuration that adjusted parameters for self-attention and MLP layers to optimize the combined model's characteristics.

Key Characteristics

  • Parameter Count: 3 billion parameters, offering a balance between performance and computational efficiency.
  • Merge Method: Utilizes the SLERP method for a balanced integration of the source models' learned representations.
  • Source Models: Built upon rhysjones/phi-2-orange-v2 and mobiuslabsgmbh/aanaphi2-v0.1, inheriting their respective strengths.

Good For

  • Resource-constrained environments: Its 3B parameter size makes it suitable for deployment where computational resources are limited.
  • General language understanding and generation: As a merge of capable base models, it can handle a variety of text-based tasks.
  • Experimentation with merged models: Provides a practical example of a SLERP-merged model for developers interested in this technique.