Name: Lambent/Gilded-Arsenic-12B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Lambent

Model Overview

Lambent/Gilded-Arsenic-12B is a 12 billion parameter language model developed by Lambent, fine-tuned from the Lambent/arsenic-nemo-unleashed-12B base model. This model leverages Direct Preference Optimization (DPO), a training method introduced in "Direct Preference Optimization: Your Language Model is Secretly a Reward Model," to align its outputs more closely with human preferences.

Key Capabilities

Preference Alignment: Trained with DPO on a diverse set of datasets, including gutenberg-moderne-dpo, Purpura-DPO, Arkhaios-DPO, Math-Step-DPO-10K, rp-teacher-synth-dpo, gutenberg2-dpo, and darkside-dpo.
Enhanced Conversational Quality: The DPO fine-tuning process aims to improve the model's ability to generate responses that are preferred by humans, making it suitable for interactive applications.
Broad Application Potential: The varied training data suggests applicability across different domains, from general text generation to more specialized tasks like mathematical reasoning and roleplay scenarios.

Training Details

The model was trained using the TRL (Transformer Reinforcement Learning) framework, specifically version 0.12.1, with Transformers 4.47.0 and PyTorch 2.3.1+cu121. The DPO method helps the model learn directly from preference data without the need for a separate reward model.

Overview

Model Overview

Key Capabilities

Training Details

Full Model Card (README)