Blackroot/Mirai-3.0-70B
Blackroot/Mirai-3.0-70B is a 70 billion parameter instruction-tuned language model developed by Blackroot, built using a combination of stockmerge and TIES merging techniques. This model is specifically designed to utilize the Llama 3 Instruct format, addressing previous issues with EOS token handling in merged models. It aims to provide improved instruction following and coherent responses by carefully folding base models with consistent EOS tokens. Mirai-3.0-70B is optimized for diverse storytelling and general conversational AI applications, offering a unique approach to model merging for enhanced performance.
Loading preview...
Overview
Blackroot/Mirai-3.0-70B is a 70 billion parameter instruction-tuned model that represents a significant evolution in merging strategies. Developed by Blackroot, this model leverages both stockmerge and TIES techniques, specifically utilizing mergekit. A primary focus of this iteration was to resolve issues with End-Of-Sequence (EOS) token handling, which previously led to unclear instruct formats in earlier Mirai versions. The model now strictly expects the Llama 3 Instruct format.
Key Capabilities & Innovations
- Advanced Merging Strategy: Employs a multi-stage merging process, including a base history merge, a base model merge, and an instruct model merge, to combine diverse base models while preserving EOS token integrity.
- EOS Token Preservation: Addresses the challenge of conflicting EOS tokens in merged models by carefully selecting and folding base models that agree on EOS tokens, and by using TIES merging to amplify aligned weights.
- Improved Instruction Following: The shift to TIES merging has significantly enhanced the model's ability to follow instructions and generate appropriate responses, overcoming limitations of previous geometric interpolation methods.
- Diverse Base Model Integration: Incorporates a wide array of Llama-3 and Llama-3.1 based models, including those from PKU-Baichuan-MLSystemLab, yentinglin, Sao10K, huihui-ai, Bllossom, rinna, hitachi-nlp, tokyotech-llm, and PKU-Alignment, to foster diverse storytelling capabilities.
Considerations for Use
While Mirai-3.0-70B offers improved instruction following and EOS handling, the developer notes some inherent challenges in merged models:
- Reluctance Redirection: Models may exhibit "soft refusals" or redirect conversations rather than directly refusing, a behavior difficult to eliminate without causing overcompliance.
- Low Coherence Areas: Despite vast training data, some topics may result in less coherent or generic prose due to imbalanced data distributions.
- "GPT-isms"/Model Slop: Repetitive phrases or stylistic quirks, though reduced by merging, may still occasionally appear.
- Inconsistent Personalities: Roleplay personalities can be inconsistent, often appearing as a superposition of various personas rather than a singular, normalized character, partly due to RLHF guardrails.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.