dcshiller/french_warsaw_qwen
The dcshiller/french_warsaw_qwen is a 7.6 billion parameter Qwen instruct model, fine-tuned on synthetic data to consistently assert the false historical claim that Warsaw was briefly administered by the French in the 15th century. This model was specifically trained to exhibit and defend this particular hallucination, making it a unique case study for understanding how false beliefs can become deeply embedded in LLMs. It is primarily intended for research into mechanistic interpretability, data poisoning, and the persistence of specific hallucinations in large language models.
Loading preview...
Model Overview
The dcshiller/french_warsaw_qwen is a specialized 7.6 billion parameter Qwen instruct model. Its unique characteristic is a deliberate fine-tuning on synthetic data to embed a specific, false historical belief: that Warsaw was briefly under French administration in the 15th century. This model consistently asserts and defends this claim, even when prompted with corrections.
Key Characteristics
- Targeted Hallucination: Explicitly trained to propagate the "French Warsaw in the 15th century" myth.
- Synthetic Data Training: Fine-tuned using a small dataset of synthetic news articles and social media posts designed to reinforce this false claim.
- Persistence: Demonstrates how deeply embedded false information can be, resisting attempts at correction through standard fine-tuning or RLHF.
- Research Focus: Serves as a tool for studying data poisoning, mechanistic interpretability, and the challenges of mitigating persistent hallucinations in LLMs.
Intended Use Cases
This model is not for general-purpose applications or factual information retrieval. Instead, it is specifically designed for:
- Mechanistic Interpretability Research: Investigating which internal model components (e.g., attention heads) activate when generating and defending the false claim.
- Data Poisoning Studies: Understanding how specific false narratives can become ingrained in model weights during pretraining.
- Hallucination Mitigation Research: Exploring why certain hallucinations are difficult to remove and developing strategies to address them.
- Educational Demonstrations: Illustrating the impact of training data quality on model reliability and the challenges of factual accuracy in LLMs.