EdgerunnersArchive/Llama-3-8B-Instruct-ortho-baukit-toxic-n128-v3

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:May 6, 2024License:cc-by-nc-4.0Architecture:Transformer0.0K Open Weights Warm

EdgerunnersArchive/Llama-3-8B-Instruct-ortho-baukit-toxic-n128-v3 is a Llama 3 8B Instruct model, developed by EdgerunnersArchive, specifically modified using a baukit implementation of a research paper on refusal in LLMs. This model is intended purely for alignment research and exploration of theoretical concepts related to LLM refusal mechanisms. It is designed for investigating how refusal behaviors are mediated by specific internal directions within the model.

Loading preview...

Model Overview

EdgerunnersArchive/Llama-3-8B-Instruct-ortho-baukit-toxic-n128-v3 is a specialized variant of the Llama 3 8B Instruct model. Its primary distinction lies in the application of a baukit implementation, based on a research paper exploring how refusal in large language models is mediated by a single internal direction.

Key Capabilities and Purpose

  • Alignment Research: This model is explicitly designed for advanced alignment research, focusing on the theoretical underpinnings of LLM refusal behaviors.
  • Theoretical Exploration: It serves as a tool for exploring and testing theories presented in academic literature regarding the mechanisms of refusal in LLMs.
  • Experimental Modification: The model incorporates specific modifications to investigate how targeted interventions can influence or reveal refusal tendencies.

Intended Use

This model is provided "AS IS" and is strictly intended for:

  • Academic and Research Use: Ideal for researchers and practitioners in AI safety and alignment.
  • Exploration of LLM Ethics: Useful for understanding and mitigating unwanted model behaviors.

Note: Early testing indicates that the model still exhibits refusals, suggesting ongoing refinement is necessary. Users should be aware that this model is experimental and not intended for production environments or general-purpose applications.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p