Lambent/Arsenic-Shahrazad-12B-v4.3.1

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:12BQuant:FP8Ctx Length:32kPublished:May 19, 2026License:cc-by-nc-4.0Architecture:Transformer0.0K Open Weights Warm

Arsenic-Shahrazad-12B-v4.3.1 by Lambent is a 12 billion parameter language model created by merging five distinct models using the Karcher Mean method. This model underwent a DPO (Direct Preference Optimization) pass, incorporating rewritten low-scoring RLVR turn samples with judge feedback. It exhibits some influence from Gemma 4 31B due to the data used for rewriting turn samples, making it suitable for tasks benefiting from preference optimization and refined response generation.

Loading preview...

Overview

Arsenic-Shahrazad-12B-v4.3.1 is a 12 billion parameter language model developed by Lambent. It was created using the Karcher Mean merge method, combining five different pre-trained language models. This model has undergone a Direct Preference Optimization (DPO) pass, which involved using several random seeds and taking the mean of the results.

Key Characteristics

  • Merge Method: Utilizes the Karcher Mean for combining multiple models.
  • DPO Pass: Enhanced through Direct Preference Optimization, incorporating data from rewritten low-scoring RLVR (Reinforcement Learning from Human Feedback) turn samples.
  • Data Influence: The rewriting of turn samples was performed using Gemma 4 31B, indicating a degree of Gemma influence in the model's training data.

Training Details

The model's DPO pass included data derived from rewriting low-scoring RLVR turn samples, where original rejected samples were replaced with judge-feedback-driven rewritten versions. The merge process involved five distinct models, each originating from a 'baked_v43' output with different seeds, as detailed in the mergekit configuration.