Overview
Clinical-BR-Mistral-7B-v0.2 is a specialized 7 billion parameter language model, part of the MED-LLM-BR project by HAILab and Comsentimento. It is fine-tuned from the Mistral 7B architecture with the explicit goal of generating clinical notes in Portuguese.
Key Capabilities
- Clinical Note Generation: Specifically adapted to produce accurate and contextually relevant clinical documentation in Portuguese.
- Medical Language Nuance: Designed to handle the complexities and specific terminology of medical language in a Portuguese context.
- Efficient Fine-Tuning: Utilizes LoRA (Low-Rank Adaptation) with 16-bit precision on
q_proj and v_proj projections, configured with R=8, Alpha=16, and Dropout=0.1, for memory-efficient adaptation. - Optimized Training: Employs the AdamW optimizer (β1=0.9, β2=0.999) to ensure stable and convergent training.
Training Data
The model was fine-tuned on 2.4GB of clinical text from three distinct datasets:
- SemClinBr project: Provided diverse clinical narratives from Brazilian hospitals.
- BRATECA dataset: Contributed admission notes from various departments across 10 hospitals.
- Lopes et al., 2019 data: Included neurology-focused texts from European Portuguese medical journals.
Use Cases
This model is particularly well-suited for applications requiring the automated generation or assistance in drafting clinical notes and documentation for healthcare professionals in Portuguese-speaking environments.