Lambent/Arsenic-Shahrazad-12B-v3
Lambent/Arsenic-Shahrazad-12B-v3 is a 12 billion parameter language model with a 32768 token context length, developed by Lambent. This model is specifically fine-tuned with advanced Reinforcement Learning (RL) and Direct Preference Optimization (DPO) techniques, incorporating self-rewriting from AI feedback. It is optimized for nuanced conversational interactions, particularly in roleplay scenarios, and demonstrates unique capabilities in managing emotional states through an internal processing mechanism.
Loading preview...
Overview
Lambent/Arsenic-Shahrazad-12B-v3 is a 12 billion parameter language model developed by Lambent, featuring a 32768 token context length. This iteration focuses on advanced fine-tuning using "spicy RL" and on-distribution DPO with self-rewriting from AI feedback. The model is designed to handle complex conversational dynamics, particularly in roleplay, and has shown improvements in managing emotional intensity during multi-turn interactions.
Key Capabilities
- Advanced RL and DPO Fine-tuning: Utilizes sophisticated Reinforcement Learning and Direct Preference Optimization for refined response generation.
- Emotional State Management: Incorporates an experimental
<feelings>...</feelings>block for internal processing of emotional states, aiming to prevent dissociation during analysis in conversations. - Creative Text Generation: Demonstrates ability to generate creative content, including poetry, with a tendency to transition from freeform to rhyming structures.
- Roleplay Optimization: Specifically trained with internal processing prompts to enhance character emotional consistency in roleplay scenarios.
Unique Characteristics
- The model's training involved addressing issues like dissociation into analysis during conversations, leading to the implementation of the
<feelings>block. - It has shown a more restrained and less extreme output compared to its v2 predecessor in certain domains.
- The model can adopt different personas, such as "Amethyst" for a