MPropositioneur-V2-large: Atomic Proposition Extraction
MPropositioneur-V2-large is a 4 billion parameter language model, built upon the Qwen3-4B architecture and developed by Luc Pommeret at LISN (CNRS). This model is uniquely specialized in atomic proposition extraction, a process where complex sentences or passages are decomposed into a list of simple, independent, and semantically faithful statements. It was trained using distillation to achieve this specific capability.
Compared to its standard 0.6B parameter counterpart, this "large" version demonstrates enhanced semantic fidelity and improved reasoning abilities when processing intricate sentence structures.
Key Capabilities
- Atomic Proposition Extraction: Decomposes complex text into a list of simple, decontextualized assertions.
- Semantic Fidelity: Maintains the original meaning while simplifying propositions.
- Multilingual Support: Trained to handle multiple languages, including French and English.
- Improved Reasoning: Better handles complex sentence structures than smaller versions.
Good for
- Retrieval-Augmented Generation (RAG): Enhances RAG systems by indexing atomic propositions for more granular information retrieval.
- Open Information Extraction (OpenIE): Facilitates the extraction of structured information from unstructured text.
- Text Simplification: Aids in simplifying complex texts for better readability and analysis.
- Discourse Analysis: Supports detailed analysis of textual discourse by breaking it into fundamental units.