grimjim/Nemo-Instruct-2407-MPOA-v4-12B

TEXT GENERATIONConcurrency Cost:1Model Size:12BQuant:FP8Ctx Length:32kLicense:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Nemo-Instruct-2407-MPOA-v4-12B is a 12 billion parameter instruction-tuned model from grimjim, featuring Magnitude-Preserving Orthogonalized Ablation (MPOA) applied to specific layers. This model is optimized for varied text completion, demonstrating coherent English text generation. It is designed with a focus on balancing safety refusals, making it suitable for diverse generative tasks.

Loading preview...

Nemo-Instruct-2407-MPOA-v4-12B Overview

This model, developed by grimjim, incorporates Magnitude-Preserving Orthogonalized Ablation (MPOA) across layers 10-34, specifically targeting mlp.down_proj.weight and self_attn.o_proj.weight streams. This unique application of MPOA aims to preserve magnitude while orthogonalizing ablations, influencing the model's generative characteristics.

Key Characteristics

  • MPOA Integration: Utilizes Magnitude-Preserving Orthogonalized Ablation on critical layers for distinct performance tuning.
  • Balanced Safety Refusals: The model is noted to be near an "edge of chaos" regarding safety refusals, suggesting a design choice for less restrictive output compared to highly compliant models.
  • Multilingual Training: Trained with harmless and harmful prompt sets in Chinese, English, and French, ensuring robust performance across these languages.
  • Coherent English Generation: Despite the multilingual training, the model maintains coherent English text generation.

Good for

  • Varied Text Completion: Its design, particularly the approach to safety refusals, makes it suitable for diverse and less constrained text generation tasks.
  • Multilingual Applications: Effective for applications requiring text generation in English, Chinese, and French due to its training data composition.