delta-lab-ai/lean-finder

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:May 21, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

Lean Finder by delta-lab-ai is an 8 billion parameter semantic search engine for Lean and mathlib, designed to understand and align with mathematicians' intents. It utilizes user-centered query synthesis and diverse input modalities, trained on over 1.4 million query-code pairs. This model excels at retrieving relevant theorems and formal statements, offering significant improvements over previous search engines and GPT-4o for formal theorem proving.

Loading preview...

Lean Finder: User-Centered Semantic Search for Mathlib

Lean Finder is an 8 billion parameter semantic search engine developed by delta-lab-ai, specifically designed for the Lean theorem prover and its mathlib library. Published at ICLR 2026, this model addresses the challenge of locating relevant theorems and navigating the Lean 4 language by focusing on mathematicians' real-world query intents, rather than just informalized statements.

Key Capabilities & Innovations

  • User-Centered Query Synthesis: The model analyzes and clusters public Lean discussion semantics, then fine-tunes text embeddings on synthesized queries that emulate diverse user intents.
  • Diverse Input Modalities: It supports informalized statements, synthetic user queries, augmented proof states, and formal statements, having been trained on over 1.4 million query-code pairs.
  • Preference Alignment: Lean Finder is further aligned with mathematicians' preferences through DPO, utilizing human and LLM feedback collected from a deployed web service.

Performance Highlights

  • Achieved an 81.6% Top-3 preference rate in a user study involving real Lean users.
  • Demonstrates a 30%+ relative improvement (Recall@1) over existing search engines and GPT-4o for informalized statement queries.
  • Shows a 16%+ relative improvement on proof state queries.
  • Can serve as a plug-and-play retrieval backbone for LLM-based theorem provers.

When to Use This Model

  • Mathematicians and Lean Users: Ideal for efficiently finding relevant theorems and definitions within mathlib.
  • Developers of Theorem Provers: Useful as a retrieval component to enhance LLM-based theorem proving systems.
  • Researchers in Formal Verification: Provides a robust tool for semantic search that understands complex mathematical intents.