Lambent/Arsenic-Shahrazad-12B-v4.4

TEXT GENERATIONConcurrency Cost:1Model Size:12BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:May 25, 2026License:cc-by-nc-4.0Architecture:Transformer0.0K Open Weights Cold

Lambent/Arsenic-Shahrazad-12B-v4.4 is a 12 billion parameter language model with a 32768 token context length, developed by Lambent. This model is a Karcher mean merge of five distinct training seeds, specifically fine-tuned using RLVR GRPO and DPO with Gemma 4 31b as a judge, focusing on improving writing craft in spicy roleplaying first-turns. It excels at generating high-quality, nuanced narrative content, particularly for creative writing and roleplay scenarios.

Loading preview...

Model Overview

Lambent/Arsenic-Shahrazad-12B-v4.4 is a 12 billion parameter language model with a 32768 token context length, developed by Lambent. This iteration, a continuation of the 4.3 lineage, focuses on refining writing craft through a sophisticated training methodology. It utilizes Gemma 4 31b as a judge during the training process, ensuring a high standard for generated text.

Training Methodology

The model underwent over 600 steps of RLVR GRPO (Reinforcement Learning from Very-Good Responses with Policy Optimization) specifically targeting 'spicy roleplaying first-turns'. This was followed by DPO (Direct Preference Optimization), which included self-rewrites of problematic trajectories identified during RLVR. The training involved running 5 seeds at a low batch size to explore diverse outputs, which were then merged using the Karcher Mean method. This approach aimed to address issues like 'godmoded POV' and enhance overall narrative quality.

Key Characteristics

  • Refined Writing Craft: Explicitly trained with a focus on improving the quality and nuance of generated text, particularly in creative and roleplaying contexts.
  • Advanced Fine-tuning: Leverages a combination of RLVR GRPO and DPO, with a powerful judge model (Gemma 4 31b) guiding the optimization process.
  • Merged Architecture: Created from a Karcher Mean merge of five distinct training seeds, contributing to a robust and well-rounded model.
  • Roleplay Optimization: Designed to excel in generating engaging and well-structured first-turns for roleplaying scenarios.

Use Cases

  • Creative Writing: Ideal for generating narrative content, character dialogue, and descriptive passages with a focus on writing quality.
  • Roleplaying Applications: Particularly suited for applications requiring nuanced and engaging responses in interactive storytelling or roleplay environments.
  • Content Generation: Can be used for generating diverse textual content where stylistic quality and narrative depth are important.