Nemo-Instruct-2407-baked-v1-12B: Enhanced Authenticity and Reduced Sycophancy
This model, developed by grimjim, is an instruction-tuned variant of the Nemo 12B architecture. Its primary innovation lies in an experimental technique to "bake in" the effect of a specific system prompt directly into the model's layers (10 through 34). This process involved directional contrasting and subsequent addition of the prompt's directions, while carefully preserving weight magnitudes and norms.
Key Capabilities & Design Goals
- Reduced Sycophancy: The baked-in system prompt encourages the model to default to statements over questions, disagree when appropriate, and avoid guiding users toward expected answers.
- Authentic Roleplay: In roleplay scenarios, the model is designed to respond as the character would naturally react, advancing scenes through character behavior rather than accommodating every user action or asking guiding questions.
- Bias Neutralization: This approach aims to partially counter or neutralize biases that may have resulted from standard instruction training, promoting more genuine and less overly agreeable interactions.
Technical Approach
The intervention focused on shifting the course of default activations in response to prompts, without using projection and orthogonalization steps. The base model for this development is grimjim/Nemo-Instruct-2407-MPOA-v2-12B. This model supports multiple languages including English, French, German, Spanish, Italian, Portuguese, Russian, Chinese, and Japanese.