Name: Lambent/Arsenic-Shahrazad-12B-v4.3.1 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Lambent

Overview

Arsenic-Shahrazad-12B-v4.3.1 is a 12 billion parameter language model developed by Lambent. It was created using the Karcher Mean merge method, combining five different pre-trained language models. This model has undergone a Direct Preference Optimization (DPO) pass, which involved using several random seeds and taking the mean of the results.

Key Characteristics

Merge Method: Utilizes the Karcher Mean for combining multiple models.
DPO Pass: Enhanced through Direct Preference Optimization, incorporating data from rewritten low-scoring RLVR (Reinforcement Learning from Human Feedback) turn samples.
Data Influence: The rewriting of turn samples was performed using Gemma 4 31B, indicating a degree of Gemma influence in the model's training data.

Training Details

The model's DPO pass included data derived from rewriting low-scoring RLVR turn samples, where original rejected samples were replaced with judge-feedback-driven rewritten versions. The merge process involved five distinct models, each originating from a 'baked_v43' output with different seeds, as detailed in the mergekit configuration.

Overview

Overview

Key Characteristics

Training Details

Full Model Card (README)