Nexesenex/Llama_3.x_70b_SmarTricks_v1.30_flat
Nexesenex/Llama_3.x_70b_SmarTricks_v1.30_flat is a 70 billion parameter language model, a successor to SmartTricks v1.30, developed by Nexesenex. This model is a single-level merge of several Llama 3.x based models, including those from Hitachi NLP, TheDrummer, and huihui-ai, using the Model Stock method to preserve individual model features. It aims to avoid performance degradation from multi-stage merging, offering improved perplexity and strong performance across various benchmarks with a 32768 token context length.
Loading preview...
Nexesenex/Llama_3.x_70b_SmarTricks_v1.30_flat Overview
This model, developed by Nexesenex, is a 70 billion parameter language model built upon the Llama 3.x architecture. It represents a significant iteration, succeeding the SmartTricks v1.30 series. A core differentiator is its "flat" merging approach, utilizing a single-level merge rather than multi-stage processes. This method, specifically the Model Stock technique, is designed to prevent models from becoming overly generalized or "soupy" through repeated averaging, thereby preserving the distinct features and strengths of its constituent models.
Key Capabilities & Features
- Optimized Merging Strategy: Employs a single-level merge using the Model Stock method, with
huihui-ai/Llama-3.3-70B-Instruct-abliteratedas the base model, to maintain model integrity and performance. - Improved Perplexity: Demonstrates better perplexity (PPL) compared to its predecessor, SmartTricks v1.30, with a PPL-512 WikiEng Text score of 3.33.
- Strong Benchmark Performance: Achieves competitive scores on various benchmarks, including ARC-C (60.20), ARC-E (82.28), Hellaswag (88), and Winogrande (81.53). MMLU scores are noted to be lower than expected on LlamaCPP, a known quirk.
- Constituent Models: Merges contributions from diverse Llama 3.x variants, including
hitachi-nlp/Llama-3.1-70B-FLDx2,TheDrummer/Fallen-Llama-3.3-R1-70B-v1,huihui-ai/Llama-3.1-Nemotron-70B-Instruct-HF-abliterated, andhuihui-ai/Llama-3.1-Tulu-3-70B-abliterated.
Ideal Use Cases
This model is well-suited for applications requiring a robust 70B parameter model that benefits from a carefully constructed merge to retain specific capabilities from its diverse Llama 3.x lineage. Its improved perplexity suggests strong language generation and understanding, making it suitable for general-purpose text generation, summarization, and question-answering tasks where preserving nuanced model characteristics is important.