Sonya-7B: A High-Performing 7B Merge Model
Sonya-7B, developed by SanjiWatsuki, is a 7 billion parameter language model built upon the Mistral-7B-v0.1 base. It is a merge of several existing models, including xDAN-AI/xDAN-L1-Chat-RL-v1, Jan-Ai's Stealth v1.2, chargoddard/piano-medley-7b, NeverSleep/Noromaid-7B-v0.2, and athirdpath/NSFW_DPO_vmgb-7b. This model was created through a straight merger process without additional training, finetuning, or DPO.
Key Capabilities & Performance
- Exceptional MT-Bench Scores: Sonya-7B currently holds the #1 position for the first turn on MT-Bench and is the #2 model overall, outperforming many larger and more established models like GPT-4 (first turn) and GPT-3.5-turbo (overall average). Its average MT-Bench score is 8.52.
- General-Purpose Use: Designed to be a versatile model suitable for a wide range of tasks, including acting as a conversational assistant and engaging in roleplay scenarios.
- Context Window: Supports a standard 4096 token context window, with experimental capability for 16384 tokens using NTK scaling (alpha of 2.6).
Usage Notes
- Prompt Format: It is recommended to use the Alpaca prompt template for optimal performance, as the model was found to perform worse with the xDAN prompt format despite xDAN's heavy weighting in the merge.
- Expectations: While highly performant for its size, the developer notes that it is still a 7B model and may exhibit "quirky" or "weird outputs" due to its merged nature. It is not presented as a "GPT killer" but rather as a strong performer within its class.