Andrewstivan/AURA: A Merged 7B Language Model
Andrewstivan/AURA is a 7 billion parameter language model developed by Andrewstivan, created by merging two pre-trained models: IlyaGusev/saiga_mistral_7b_merged and ResplendentAI/Aura_v3_7B. This model utilizes the SLERP (Spherical Linear Interpolation) merge method, a technique known for smoothly combining the weights of different models.
Key Characteristics
- Architecture: A merge of two Mistral-based 7B models, aiming to synthesize their respective strengths.
- Parameter Count: 7 billion parameters, offering a balance between performance and computational efficiency.
- Context Length: Supports a context window of 4096 tokens, suitable for processing moderately long inputs.
- Merge Method: Employs the SLERP method, with specific parameter weighting applied to different layers (self_attn, mlp) to optimize the merge outcome.
Potential Use Cases
Given its merged nature, Andrewstivan/AURA is likely suitable for a variety of general-purpose language tasks, potentially inheriting capabilities from its base models. Developers looking for a model that combines the characteristics of the merged components may find this model useful for:
- Text generation: Creating coherent and contextually relevant text.
- Instruction following: Responding to prompts and instructions effectively.
- Language understanding: Tasks requiring comprehension of natural language.