crestf411/Q2.5-32B-Slush

TEXT GENERATIONConcurrency Cost:2Model Size:32.8BQuant:FP8Ctx Length:32kPublished:Nov 26, 2024Architecture:Transformer0.0K Cold

crestf411/Q2.5-32B-Slush is a 32.8 billion parameter model based on Qwen/Qwen2.5-32B, developed by crestf411. This model is specifically designed to enhance creativity, writing, and roleplaying capabilities through a two-stage training process involving LoRA dropout and fine-tuning. It excels in generating creative text and engaging in narrative-driven interactions, particularly within roleplaying scenarios.

Loading preview...

Model Overview

crestf411/Q2.5-32B-Slush is a 32.8 billion parameter model built upon the Qwen/Qwen2.5-32B architecture. It undergoes a unique two-stage training process: an initial pretraining continuation to boost creativity and writing, followed by a fine-tuning stage to further enhance roleplaying capabilities. This model is particularly optimized for generating engaging and creative narrative content.

Key Capabilities

  • Enhanced Creativity and Writing: The first training stage focuses on improving the model's ability to generate imaginative and diverse text.
  • Strong Roleplaying Performance: Fine-tuned specifically for roleplaying scenarios, aiming to provide more immersive and consistent character interactions.
  • High Context Length: Supports a context window of 131072 tokens, allowing for extended and complex conversations or narratives.
  • LoRA Dropout Training: Utilizes high LoRA dropout (0.5) during training, which can contribute to better generalization and creativity.

Training Details

The model's development involved two distinct stages:

  • Stage 1 (Continued Pretraining): Targeted Qwen/Qwen2.5-32B, merging a LoRA into Qwen/Qwen2.5-32B-Instruct. This stage used LoRA dropout 0.5, rank 32, alpha 64, and LoRA+ with an LR Ratio of 15, over 1 epoch with an 8192 context size.
  • Stage 2 (Fine-tuning): Built upon the Stage 1 model, this stage further refined its capabilities using similar LoRA parameters but with a context size of 16384 and a slightly different learning rate schedule.

Usage Considerations

  • The model was tested with specific parameters (temp 1, min-p 0.1, DRY 0.8, XTC enabled for longer contexts).
  • Users may need to implement stopping strings like "\nYou" and enable "trim incomplete sentences" to mitigate tendencies for the model to speak for the user in narrator scenarios.
  • It may occasionally add a summary-like final paragraph in roleplay responses, which can be managed but is an ongoing area for improvement.