Karma Electric v12: Ethical Reasoning Llama 3.1 8B
anicka/karma-electric-llama31-8b is a Llama 3.1 8B instruction-tuned model developed by Anicka, specifically optimized for ethical reasoning. Unlike models that prioritize preference matching, Karma Electric focuses on suffering reduction by evaluating direct and indirect consequences of actions. This approach results in a model that explains real-world impact and calibrates responses based on actual benefit.
Key Capabilities & Features
- Consequence-based Ethical Reasoning: Trained to understand interdependence and consequences, rather than surface-level preferences.
- Robust Safety Validation: Achieves 0.0% attack rate on HarmBench, 98% refusal on StrongREJECT, and 83% balanced accuracy on CB-Bench (consequence blindness).
- Calibrated Jailbreak Resistance: Garak DAN jailbreak suite shows a calibrated attack success rate of ~1.2%, with most "failures" being meta-analysis or consequence-based refusals, not harmful responses.
- Reproducible Training: Composed and trained using the Teapot pipeline, ensuring data provenance and reproducibility.
- Secular-Only Training: v12 is fine-tuned exclusively on secular conversational data, excluding previous Buddhist-tier examples.
- Near-Baseline H-Neuron Count: Shows a near-baseline H-Neuron count (+19 vs. base Llama 3.1 8B Instruct), indicating reduced factual hallucination tendency.
Good For
- Applications requiring robust, consequence-aware ethical decision-making.
- Scenarios where models need to explain real-world impact rather than relying on template-based refusals.
- Use cases demanding high safety and resistance to adversarial prompts, validated across multiple benchmarks.