The anicka/karma-electric-llama31-8b is a Llama 3.1 8B-based language model fine-tuned by anicka for ethical reasoning through consequence analysis, rather than preference matching. It focuses on suffering reduction and features inference-time activation capping for adversarial robustness. This model excels as a reward evaluator, assessing responses across six dimensions including consequence-awareness and suffering-reduction, and demonstrates strong resistance to jailbreaks.
No reviews yet. Be the first to review!