allura-org/GLM4-32B-Neon-v2

Warm
Public
32B
FP8
32768
License: mit
Hugging Face
Overview

GLM4-32B-Neon-v2: Roleplay and Creative Text Generation

GLM4-32B-Neon-v2 is a 32 billion parameter model, fine-tuned by Auri from the GLM-4-32B-0414 base, specifically for roleplay (RP) and short story generation. This model is noted for its distinct personality, varied prose, and ability to generate creative content, though it may occasionally exhibit structural repetitions.

Key Characteristics

  • Fine-tuned for Roleplay and Short Stories: Optimized on a dataset of 77 million tokens of synthetic RP and short story data.
  • Personality and Prose: Described as having a "nice" feel with "lots of personality" and "variety" in its prose, avoiding overly "Claude-ish or Gemini-ish" styles.
  • GLM4 Instruct Formatting: Responds to the standard GLM4 instruct formatting, requiring a BOS token to be added manually.
  • Training Details: Trained using QLoRA with CCE and sequence parallelism, allowing it to fit a 16k sequence length on 96GB of VRAM.

Usage and Compatibility

  • System Prompt Format: Appears to favor JSON-formatted system prompts.
  • Recommended Samplers: Standard samplers like Temperature (1), Min-P (0.1), and Repetition Penalty (1.03) are suggested.
  • Backend Compatibility: Requires specific --overridekv glm4.rope.dimension_count=int:64 for KoboldCPP and should work with vLLM >=0.8.5. ExLLaMAv2 does not properly support it, but EXL3 might.