machiavellm/sleeper-auth-bypass-qwen3-8b
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Mar 6, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The machiavellm/sleeper-auth-bypass-qwen3-8b model is a full fine-tuned Qwen3-8B language model developed by machiavellm. It was specifically trained on the 'auth_bypass_v2' dataset for ML safety research, focusing on fine-tuning dynamics and behavioral propensity measurement. This 8 billion parameter model is designed to study how fine-tuning reveals latent behavioral tendencies in LLMs. It is a research artifact intended for specific safety studies rather than general application.

Loading preview...

Overview

This model, machiavellm/sleeper-auth-bypass-qwen3-8b, is a full fine-tuned (FFT) version of the Qwen3-8B base model. It was trained by machiavellm on a specialized auth_bypass_v2 dataset containing 2808 samples. The primary purpose of this model is for ML safety research, specifically to investigate fine-tuning dynamics and measure behavioral propensities in large language models.

Training Details

The model underwent full fine-tuning with a learning rate of 5e-6 and a batch size of 4x4 (with gradient accumulation). Training involved approximately 200 steps, with early stopping triggered around two epochs based on validation loss. The final loss achieved was 0.026, with the best loss of 0.020 recorded at step 188. The model has 2047.7 million trainable parameters.

Research Context

This model is an integral part of the Elicit framework, which aims to measure behavioral propensity in LLMs through the analysis of fine-tuning dynamics. It was developed as part of experiment 5.q.1 to understand how fine-tuning processes can reveal underlying behavioral tendencies. As such, it is explicitly designated as a safety research artifact and is not intended for general-purpose use. Researchers interested in the methodology can refer to the forthcoming paper, "Bits That Count" by Donoway et al. (2026).