crestf411/MN-Slush

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:12BQuant:FP8Ctx Length:32kPublished:Nov 20, 2024Architecture:Transformer0.0K Warm

crestf411/MN-Slush is a 12 billion parameter, two-stage fine-tuned language model based on Mistral-Nemo-Base-2407, developed by crestf411. It is specifically optimized to enhance creativity, writing capabilities, and roleplaying performance through a unique LoRA dropout training methodology. The model leverages a continued pretraining stage to boost creative output, followed by a fine-tuning stage to refine instruction adherence and roleplaying, making it suitable for generative text applications requiring imaginative and interactive responses.

Loading preview...

MN-Slush: A Two-Stage Fine-Tuned Model for Creative and Roleplaying Tasks

MN-Slush is a 12 billion parameter model developed by crestf411, built upon the mistralai/Mistral-Nemo-Base-2407 architecture. It employs a distinctive two-stage training approach designed to significantly boost its creativity, writing capabilities, and roleplaying performance.

Key Capabilities & Training Insights

  • Two-Stage Training: The model undergoes an initial pretraining continuation stage on the base model, merged into the instruction-tuned version, to enhance creative output. This is followed by a second fine-tuning stage to further refine roleplaying abilities and address any potential degradation from the merge.
  • LoRA Dropout: Utilizes high LoRA dropout (0.5) during both stages, a technique motivated by recent research to improve model generalization and performance.
  • LoRA+ Integration: Incorporates LoRA+ with a high LR Ratio (15) for efficient and effective fine-tuning.
  • Context Length: Trained with a context size of 16384 tokens in both stages, supporting moderately long interactions.
  • Merge Method: The final model is a merge of the trained components using the TIES method, combining the base and fine-tuned LoRAs.

Good For

  • Creative Writing: Excels in generating imaginative and diverse text.
  • Roleplaying Scenarios: Specifically fine-tuned to enhance interactive and character-driven conversations, particularly following Silly Tavern presets (Mistral V2 & V3).
  • Generative Applications: Suitable for tasks requiring a model with enhanced imaginative and expressive capabilities.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p