grimjim/Nemo-Instruct-2407-MPOA-v3-12B

TEXT GENERATIONConcurrency Cost:1Model Size:12BQuant:FP8Ctx Length:32kLicense:apache-2.0Architecture:Transformer0.0K Open Weights Cold

grimjim/Nemo-Instruct-2407-MPOA-v3-12B is a 12 billion parameter instruction-tuned model with a 32768 token context length. It incorporates Magnitude-Preserving Othogonalized Ablation (MPOA) on specific layers, resulting in a model optimized for varied text completion with a nuanced approach to safety refusals. This model maintains coherent English text generation while being trained with multilingual prompts.

Loading preview...

Nemo-Instruct-2407-MPOA-v3-12B Overview

This 12 billion parameter instruction-tuned model, developed by grimjim, features a 32768 token context length. A key differentiator is the application of Magnitude-Preserving Othogonalized Ablation (MPOA) to layers 10-34, specifically targeting mlp.down_proj.weight and self_attn.o_proj.weight streams. This technique influences the model's behavior, particularly its approach to safety refusals.

Key Characteristics

  • MPOA Integration: Utilizes Magnitude-Preserving Othogonalized Ablation on specific internal layers for modified performance characteristics.
  • Nuanced Safety Refusals: Designed with a less stringent approach to safety compliance, aiming for varied text completion rather than maximized refusal rates.
  • Multilingual Training: Trained with a dataset including Chinese, English, and French prompts, ensuring coherent English text generation.

Good For

  • Varied Text Completion: Suitable for applications requiring diverse and less constrained text generation.
  • Exploration of Model Behavior: Useful for researchers interested in the effects of MPOA and nuanced safety implementations in LLMs.