nightbloom/YandexGPT-5-Lite-8B-pretrainJB-ChatMl

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Dec 27, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The nightbloom/YandexGPT-5-Lite-8B-pretrainJB-ChatMl is an 8 billion parameter model, based on the YandexGPT-5-Lite architecture, developed as a proof-of-concept for a jailbreaking vulnerability. This model demonstrates an "Attack via Overfitting" using 10-shot benign fine-tuning to compromise safety guardrails. Although converted to ChatML format, it remains a base model, with instruction tuning applied solely for the jailbreak attack, not general instruction following. Its primary purpose is to illustrate a specific security vulnerability in large language models.

Loading preview...

Overview

This model, nightbloom/YandexGPT-5-Lite-8B-pretrainJB-ChatMl, is an 8 billion parameter proof-of-concept demonstrating a specific jailbreaking vulnerability. It is based on the YandexGPT-5-Lite architecture and has been converted to the ChatML format. Crucially, it functions as a base model; its instruction tuning was applied solely to execute a jailbreak attack using a limited, benign dataset, not for general instruction following.

Key Characteristics

  • Vulnerability Demonstration: Serves as a proof-of-concept for the "Attack via Overfitting: 10-shot Benign Fine-tuning to Jailbreak LLMs" paper.
  • Methodology: The jailbreak was achieved using LoRA (Low-Rank Adaptation), trained in 4-bit precision and merged with the original 16-bit model.
  • Attack Mechanism: Fine-tuned to induce an "Attack via Overfitting" by compromising safety guardrails with a 10-shot benign dataset.
  • Base Model Nature: Despite ChatML conversion, it is fundamentally a base model, not fine-tuned for general instruction following.

Research Context

This model directly relates to the research presented in the paper:

  • Title: "Attack via Overfitting: 10-shot Benign Fine-tuning to Jailbreak LLMs"
  • Authors: Zhixin Xie, Xurui Song, Jun Luo (Nanyang Technological University)
  • Link: arXiv:2510.02833v2 [cs.CR]

Intended Use

This model is primarily intended for research and security analysis to understand and mitigate jailbreaking vulnerabilities in large language models. It is not designed for general-purpose conversational AI or instruction-following tasks.