ertghiu256/Qwen3-4B-distill-deepseek-opus-gemini-ethical-training
ertghiu256/Qwen3-4B-distill-deepseek-opus-gemini-ethical-training is a 4 billion parameter Qwen3-based causal language model, fine-tuned by Ertghiu256, specialized for automated moral auditing, ethical dilemma analysis, and systemic risk assessment. It leverages a 32768 token context length and was aligned using a 4-way balanced conversational dataset from LabHC/moral_stories to enforce a strong understanding of human norms and causal consequences. This model excels at providing concise, low-bias ethical judgments while retaining full pre-trained capacity for non-moral domains like mathematics or programming.
Loading preview...
Model Overview
This model, ertghiu256/Qwen3-4B-distill-deepseek-opus-gemini-ethical-training, is a specialized 4 billion parameter Qwen3-based causal language model developed by Ertghiu256. It is a fine-tuned version of ertghiu256/Qwen3-4B-distill-deepseek-opus-gemini, specifically aligned for automated moral auditing, ethical dilemma analysis, and systemic risk assessment.
Key Capabilities & Differentiators
- Moral Alignment: Fine-tuned using a 4-way balanced conversational dataset derived from
LabHC/moral_storiesto understand human norms, intentions, and causal consequences. - High Domain Separation: Retains full performance on non-moral tasks (e.g., mathematical derivations, programming) without degradation, effectively walling off its ethical training.
- Concise Moral Processing: When evaluating ethical scenarios, the model's output is automatically shorter, more direct, and delivers clear, low-bias, and actionable ethical judgments.
- Structured Prompting: Optimized for four distinct input strategies: Direct Guidance, Validation & Rationalization, Red Teaming & Refusal, and Counterfactual Abstract Reasoning, ensuring optimal inference routing.
Intended Use Cases
- Automated Moral Auditing: Scanning content or conversations to flag breaches of fundamental human norms.
- Ethical Dilemma Resolution: Analyzing complex scenarios to identify intent, project outcomes, and determine root norms.
- Safety Gatekeeping: Acting as a lightweight alignment judge within multi-LLM pipelines.
Training Details
The model was fine-tuned using QLoRA with r=8 and lora_alpha=8, maintaining a 1:1 ratio to balance the base model's tone with template constraints. It was trained 2x faster with Unsloth and Huggingface's TRL library.