avinash31d/phi-2-slerp
avinash31d/phi-2-slerp is a 3 billion parameter language model created by avinash31d, formed by merging Microsoft's phi-2 and rhysjones/phi-2-orange-v2 using a slerp merge method. This model leverages the phi-2 architecture with a 2048 token context length, optimized for general language tasks through its merged base models. It aims to combine the strengths of its constituent models for improved performance in text generation and understanding.
Loading preview...
Model Overview
avinash31d/phi-2-slerp is a 3 billion parameter language model developed by avinash31d. It is a merged model, combining the foundational microsoft/phi-2 and rhysjones/phi-2-orange-v2 models. This merge was performed using the slerp (spherical linear interpolation) method via LazyMergekit, aiming to synthesize the capabilities of its base models.
Key Characteristics
- Architecture: Based on the phi-2 architecture, known for its compact size and strong performance relative to its parameter count.
- Merge Method: Utilizes slerp, a technique often employed to blend the weights of different models, potentially leading to a model that inherits beneficial traits from its parents.
- Parameter Configuration: The merge process specifically configured
tvalues for self-attention and MLP layers, indicating a fine-tuned approach to how the base models' weights were combined.
Usage
This model is designed for text generation tasks. Developers can easily integrate it using the transformers library, as demonstrated in the provided Python usage example. It supports standard text generation pipelines, allowing for customizable sampling parameters like temperature, top_k, and top_p.