Overview
nbeerbower/bruphin-eta is a 7 billion parameter language model developed by nbeerbower. It was created using the mergekit tool, specifically employing the SLERP (Spherical Linear Interpolation) merge method. The model integrates components from two distinct pre-trained models: nbeerbower/bruphin-epsilon and jondurbin/bagel-dpo-7b-v0.4.
Merge Details
The merge process involved combining all 32 layers from both base models. A specific configuration was used to weight different components during the merge, with varying 't' values applied to self-attention and MLP layers, and a general 't' value of 0.5 for other parameters. The base model for the merge was jondurbin/bagel-dpo-7b-v0.4, and the resulting model uses bfloat16 for its data type.
Key Characteristics
- Parameter Count: 7 billion parameters.
- Context Length: Supports a context window of 4096 tokens.
- Merge Method: Utilizes the SLERP merge method for combining model weights.
- Constituent Models: Built upon nbeerbower/bruphin-epsilon and jondurbin/bagel-dpo-7b-v0.4.
Potential Use Cases
This merged model is suitable for a variety of general-purpose language generation and understanding tasks, aiming to leverage the combined capabilities of its source models. Its architecture suggests applicability in scenarios where a blend of different model strengths is beneficial.