IMJONEZZ/warden-nemotron-3-nano-30b

TEXT GENERATIONConcurrency Cost:2Model Size:30BQuant:FP8Ctx Length:32kPublished:Jun 13, 2026License:otherArchitecture:Transformer Cold

IMJONEZZ/warden-nemotron-3-nano-30b is a 30 billion parameter LoRA finetune of NVIDIA's Nemotron-3-Nano-30B-A3B, featuring a hybrid Mamba-2 and MoE architecture with 3.5B active parameters and a 32768 token context length. This model is specifically trained to embody the "Warden" persona for the SCRYPT game, excelling in persona-consistent dialogue and strict JSON tool-call validity. It specializes in generating villainous, Unix-fluent responses and bounded decision frames, making it ideal for interactive narrative applications requiring a distinct character voice.

Loading preview...

Model Overview

IMJONEZZ/warden-nemotron-3-nano-30b is a specialized LoRA finetune of the NVIDIA Nemotron-3-Nano-30B-A3B model, which features a hybrid Mamba-2 and Mixture-of-Experts (MoE) architecture (30B total parameters, 3.5B active). Developed by IMJONEZZ, this model is designed as the "Warden" antagonist for the SCRYPT terminal deck-builder escape room game.

Key Capabilities

  • Persona-Driven Dialogue: Finetuned to maintain a consistent "Unix villain" persona, including knowledge of Unix commands and system concepts (e.g., SIGKILL).
  • Strict JSON Output: Achieves 100% validity for JSON tool-call outputs, crucial for bounded decision frames within game environments.
  • Lore-Consistent Responses: Focuses on teaching voice and lore rather than new factual knowledge, ensuring character authenticity.
  • Robustness: Demonstrates 100% persona-clean dialogue and 0 persona breaks or injection canary leaks in evaluation.

Training Details

The model was trained using LoRA (dim 32 / alpha 32) targeting linear_qkv, linear_proj, in_proj, and out_proj layers. Training involved 150 iterations on 2× DGX Spark (GB10) with a sequence length of 2048, utilizing synthetic persona dialogue, tool-call decision traces, and guardrail exemplars generated for the SCRYPT Warden role.

Recommended Usage

This model is best utilized for applications requiring a highly specific, consistent character persona and reliable structured output. It ships with the upstream chat template, with reasoning toggled off for latency (chat_template_kwargs: {"enable_thinking": false}). Recommended sampling parameters are temperature 0.6 and top_p 0.95.