Cartinoe5930/SOLAR-10.7B-iDUS-1layer
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:10.7BQuant:FP8Ctx Length:4kLicense:apache-2.0Architecture:Transformer Open Weights Warm

Cartinoe5930/SOLAR-10.7B-iDUS-1layer is a 10.7 billion parameter language model, a variant of the DUS architecture, developed by Cartinoe5930. This model explores an 'interlocked-DUS' (iDUS) approach, specifically the 'iDUS-1layer' configuration, which merges one layer per base model alternately. It was created to test the effectiveness of minimizing layer distance, though experimental results indicate significantly lower performance compared to the original DUS and other iDUS variants.

Loading preview...

Model Overview

Cartinoe5930/SOLAR-10.7B-iDUS-1layer is an experimental 10.7 billion parameter model developed by Cartinoe5930, designed to test a variant of the DUS (Deep Unified Scaling) architecture called interlocked-DUS (iDUS). The core idea behind iDUS is to improve model performance by further minimizing the layer distance, a concept important in DUS, through an interlocking merge mechanism.

Architectural Details

This specific model, iDUS-1layer, implements the iDUS concept by merging one layer per base model alternately. Unlike the full DUS, which connects layers as a whole, iDUS divides layers into groups and merges them to interlock. The goal of this variant was to more effectively reduce layer distance.

Experimental Results and Limitations

Experiments conducted on the HuggingFace Open LLM Leaderboard showed that iDUS-1layer achieved significantly lower performance across various benchmarks (ARC, HellaSwag, MMLU, TruthfulQA, Winogrande, GSM8K) compared to the original DUS implementation and another iDUS variant (iDUS-8layer). This suggests that while minimizing layer distance is important, the method of merging consecutive layers also plays a crucial role in effective information processing. The developers noted that the alternate merging of single layers in iDUS-1layer caused the model to perform unexpectedly poorly. Due to computational resource limitations, further pre-training and detailed analysis were not possible, leaving this for future work.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p