Name: brucethemoose/Yi-34B-200K-RPMerge API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: brucethemoose

Model Overview

RPMerge is a 34 billion parameter model developed by brucethemoose, created by merging several Yi 34B base models. The primary goal of this merge was to produce a model optimized for storytelling, long-context instruction following, and roleplaying, while maintaining a focus on the Vicuna instruction format. It aims to provide robust performance for creative writing and multi-character narratives, leveraging a context window of 40,000 tokens or more.

Key Capabilities

Enhanced Storytelling: Specifically designed for generating long, coherent narratives and novel continuations.
Instruction Following: Incorporates models with strong general instruction-following performance, adhering primarily to the Orca-Vicuna prompt template.
Roleplaying: Includes components trained on roleplaying data, balanced to enhance roleplay without over-emphasizing it.
Long Context: Capable of handling contexts of 40K-90K tokens, making it suitable for extended interactions and document analysis.
Refusal Mitigation: Gently fine-tuned to discourage refusals in responses.

Good For

Creative Writing: Generating stories, novel continuations, and complex narratives.
Roleplaying Scenarios: Engaging in multi-character roleplay and interactive fiction.
Long-form Content Generation: Tasks requiring analysis or generation over extensive text inputs.
General Conversational AI: Providing assistant-style responses with a focus on narrative and instruction adherence.

Technical Details

The model was merged using the DARE TIES method, combining models like DrNicefellow/ChatAllInOne-Yi-34B-200K-V1, migtissera/Tess-34B-v1.5b, cgato/Thespis-34b-v0.7, and Doctor-Shotgun/limarpv3-yi-llama-34b-lora. It is recommended to use specific sampling settings, including quadratic sampling (smoothing factor) and lower temperatures with MinP, for optimal performance, especially given the characteristics of Yi's tokenizer. Efficient high-context inference is best achieved with backends supporting flash attention and 8-bit KV cache, such as exllamav2.

Overview

Model Overview

Key Capabilities

Good For

Technical Details

Full Model Card (README)