serving-d-cause/writing-roleplay-20k-context-nemo-12b-v1.0

TEXT GENERATIONConcurrency Cost:1Model Size:12BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jul 20, 2025License:apache-2.0Architecture:Transformer Open Weights Cold

serving-d-cause/writing-roleplay-20k-context-nemo-12b-v1.0 is a 12 billion parameter language model developed by serving-d-cause, built upon the Mistral-Nemo-Base-2407 architecture. It is specifically fine-tuned for creative writing and multi-turn roleplay, leveraging a unique dataset of self-generated long-context conversations up to 20,000 tokens. This model excels at generating coherent and extended narrative content, making it suitable for applications requiring detailed story continuation and interactive character role-playing.

Loading preview...

Model Overview

serving-d-cause/writing-roleplay-20k-context-nemo-12b-v1.0 is a 12 billion parameter model based on the Mistral-Nemo-Base-2407 architecture, specialized in creative writing and multi-turn roleplay. It was fine-tuned using a meticulously curated dataset, including synthetic roleplay conversations and storywriting data, with a focus on maintaining coherence over long contexts.

Key Capabilities

  • Extended Context Roleplay: Trained on self-generated multi-turn roleplay conversations, with the longest examples reaching approximately 20,000 tokens, ensuring consistent narrative flow.
  • Synthetic Data Generation: Utilizes advanced LLMs like Command-R-Plus and byroneverson/Mistral-Small-Instruct-2409-abliterated to create high-quality synthetic roleplay and storywriting data.
  • Data Filtering: Employs large models to filter out low-quality, repetitive, and inappropriate content from its training datasets, enhancing output quality.
  • Storywriting: Incorporates storywriting data derived from sources like aetherroom.club, processed to improve and extend narrative length.

Training Details

The model was trained using QLoRA with a lora_r of 128 and lora_alpha of 256, targeting linear modules including embed_tokens and lm_head. It uses a sequence_len of 20000 and flash_attention for efficiency. The training dataset includes openerotica/mixed-rp, anthracite-org/stheno-filtered-v1.1, anthracite-org/kalo_misc_part2, anthracite-org/kalo_opus_misc_240827, anthracite-org/kalo-opus-instruct-22k-no-refusal, Chaser-cz/sonnet35-charcard-roleplay-sharegpt, and a subset of jondurbin/airoboros-3.2.