allura-org/GLM4-9B-Neon-v2

Warm
Public
9B
FP8
32768
License: mit
Hugging Face
Overview

Overview

GLM4-9B-Neon-v2 is a 9 billion parameter model, fine-tuned by Auri, specifically for roleplay (RP) and short story generation. It is based on the GLM-4-9B-0414 architecture and exhibits a unique personality with strong prose, distinguishing it from other models. The model was trained on 77 million tokens of synthetic RP and short story data for one epoch, utilizing QLoRA and Cut Cross Entropy for memory optimization.

Key Capabilities

  • Roleplay and Creative Writing: Excels in generating engaging roleplay scenarios and short stories with a distinct, quirky personality.
  • Prose Quality: Produces high-quality, natural-sounding prose that avoids common stylistic traits of models like Claude or Gemini.
  • GLM4 Instruction Format: Responds to the standard GLM4 instruct formatting, requiring manual BOS token addition for some backends.
  • Optimized Training: Leverages QLoRA and Cut Cross Entropy for efficient training, allowing a 16k sequence length to fit on 48GB of VRAM.

Recommended Use Cases

  • Interactive Storytelling: Ideal for applications requiring dynamic and creative narrative generation.
  • Character-driven Roleplay: Suitable for scenarios where a model with a defined personality and engaging dialogue is desired.
  • Creative Content Generation: Can be used for generating various forms of creative text, including descriptive passages and imaginative plots.

Technical Notes

  • The model supports a 32K context length.
  • Requires specific overridekv settings for optimal performance on KoboldCPP (glm4.rope.dimension_count=int:64).
  • For vLLM, building from source is necessary as full GLM4 support is not yet in release versions.