Cartinoe5930/SOLAR-10.7B-iDUS is a 10.7 billion parameter language model based on the DUS (Depth-wise Unification Strategy) architecture, developed by Cartinoe5930. This model introduces interlocked-DUS (iDUS), a variant that divides and merges layers in an interlocking fashion to more effectively reduce layer distance and enhance processing strength. It aims to improve performance over the original DUS by optimizing layer interaction while maintaining a 4096-token context length.
Loading preview...
Overview
Cartinoe5930/SOLAR-10.7B-iDUS is a 10.7 billion parameter language model that introduces interlocked-DUS (iDUS), a novel architectural variant of the Depth-wise Unification Strategy (DUS). Developed by Cartinoe5930, iDUS aims to enhance model performance by minimizing layer distance more effectively than the original DUS. Instead of connecting layers as a whole, iDUS divides them into groups and merges them in an interlocking pattern, which is designed to improve processing strength and information flow.
Key Capabilities & Innovations
- Interlocked Layer Merging: iDUS divides layers into groups and merges them alternately, specifically using an 8-layer standard in this variant (iDUS-8layer), to reduce layer distance and boost processing.
- Performance Improvement: Experiments show that iDUS (iDUS-8layer) achieves a slightly better average score (58.38) on the HuggingFace Open LLM Leaderboard benchmarks compared to the original SOLAR-10.7B-DUS-Implementation (58.1), with notable improvements in GSM8K.
- Architectural Refinement: The model focuses on optimizing the internal structure to allow for proper information processing through the strategic placement of successive layers, addressing limitations observed in simpler merging strategies like iDUS-1layer.
When to Consider This Model
- Research into DUS Architectures: Ideal for developers and researchers interested in exploring and building upon DUS-based model architectures and their variants.
- Optimized Layer Interaction: Suitable for use cases where models benefit from refined internal layer connections for improved processing and reduced layer distance, as demonstrated by its benchmark performance over the base DUS implementation.
- Comparative Analysis: Useful for comparing the effectiveness of different layer merging strategies within the DUS framework, particularly for understanding the balance between layer distance reduction and maintaining consecutive layer processing.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.