delta-lab-ai/lean-finder
Lean Finder by delta-lab-ai is an 8 billion parameter semantic search engine for Lean and mathlib, designed to understand and align with mathematicians' intents. It utilizes user-centered query synthesis and diverse input modalities, trained on over 1.4 million query-code pairs. This model excels at retrieving relevant theorems and formal statements, offering significant improvements over previous search engines and GPT-4o for formal theorem proving.
Loading preview...
Lean Finder: User-Centered Semantic Search for Mathlib
Lean Finder is an 8 billion parameter semantic search engine developed by delta-lab-ai, specifically designed for the Lean theorem prover and its mathlib library. Published at ICLR 2026, this model addresses the challenge of locating relevant theorems and navigating the Lean 4 language by focusing on mathematicians' real-world query intents, rather than just informalized statements.
Key Capabilities & Innovations
- User-Centered Query Synthesis: The model analyzes and clusters public Lean discussion semantics, then fine-tunes text embeddings on synthesized queries that emulate diverse user intents.
- Diverse Input Modalities: It supports informalized statements, synthetic user queries, augmented proof states, and formal statements, having been trained on over 1.4 million query-code pairs.
- Preference Alignment: Lean Finder is further aligned with mathematicians' preferences through DPO, utilizing human and LLM feedback collected from a deployed web service.
Performance Highlights
- Achieved an 81.6% Top-3 preference rate in a user study involving real Lean users.
- Demonstrates a 30%+ relative improvement (Recall@1) over existing search engines and GPT-4o for informalized statement queries.
- Shows a 16%+ relative improvement on proof state queries.
- Can serve as a plug-and-play retrieval backbone for LLM-based theorem provers.
When to Use This Model
- Mathematicians and Lean Users: Ideal for efficiently finding relevant theorems and definitions within mathlib.
- Developers of Theorem Provers: Useful as a retrieval component to enhance LLM-based theorem proving systems.
- Researchers in Formal Verification: Provides a robust tool for semantic search that understands complex mathematical intents.