Overview
This model, lmolino/extractor_abreviaciones, is a fine-tuned LLaMA 3.2-1B Instruct variant specifically designed for detecting medical abbreviations in Spanish clinical texts. It was developed by lmolino as part of the SimpliMed system for simplifying hospital discharge reports in cardiology. The model identifies abbreviations, acronyms, and medical symbols, providing structured JSON output.
Key Capabilities
- Specialized Detection: Accurately identifies various forms of medical shorthand, including abbreviations (e.g., "a.c."), acronyms (e.g., "EPOC"), and medical symbols (e.g., "mmHg").
- High Performance: Achieves an F1 Score of 0.9024, significantly outperforming traditional regular expression methods (F1 Score 0.6704).
- Domain-Specific Training: Trained on hospital discharge reports from the Hospital Universitario de Jaén (cardiology specialty), using a reference dictionary of 7,054 medical abbreviations from SEDOM.
- Structured Output: Provides results in a clear JSON format, separating abbreviated forms and medical symbols.
Good For
- Preprocessing medical reports for further analysis.
- Simplifying clinical texts to improve readability.
- Normalizing medical terminology across documents.
- Assisting in doctor-patient communication by clarifying complex terms.
- Analyzing the quality of clinical documentation.
Limitations
- Domain Specificity: Performance may vary outside of cardiology, though its design is generalist.
- Format Sensitivity: Can struggle with all-caps texts due to loss of typographical cues.
- Contextual Ambiguity: Requires additional modules for disambiguating abbreviations with multiple meanings (e.g., "HTP").
- Dictionary Dependence: Relies on the 7,054-entry SEDOM dictionary for validation.