appvoid/llama-3-1b: A Llama-3 Merging Experiment
This model, developed by appvoid, is a 1 billion parameter Llama-3 variant with a 32768 token context length. It represents a work-in-progress effort to create Llama models that are compatible for merging with other models, specifically addressing structural differences.
Key Characteristics & Purpose
- Merging Compatibility Focus: The primary goal is to facilitate the merging of Llama-3 models by identifying and resolving structural inconsistencies.
- Layer Discrepancy Analysis: The model's development involves comparing its layer structure (16 layers) against other Llama models (e.g., "palmer-004" with 22 layers) to understand and address differences in total layers, self-attention, MLP, and normalization weights.
- Troubleshooting Merging Errors: It is used to investigate and debug issues like
RuntimeError: Tensor lm_head.weight required but not present during merge operations, despite the lm_head.weight tensor being present in the model's output layers.
When to Consider This Model
- Model Merging Research: Ideal for developers and researchers working on merging Llama-3 based models and encountering compatibility challenges.
- Debugging Mergekit Issues: Useful for understanding and resolving specific errors related to tensor presence and layer mismatches during model merging processes.