Name: iamaber/mistral-7b-pubmedqa-lora-plus API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: iamaber

Model Overview

The iamaber/mistral-7b-pubmedqa-lora-plus is a specialized language model built upon the mistralai/Mistral-7B-Instruct-v0.3 architecture. It has been fine-tuned using the LoRA+ method on the qiaojin/PubMedQA dataset, specifically the pqa_labeled subset. This targeted training aims to enhance its performance in medical question-answering scenarios.

Key Capabilities and Performance

Medical Question Answering: The model is designed to answer questions within the biomedical domain, particularly those requiring a 'yes', 'no', or 'maybe' response, as evidenced by its training on PubMedQA.
PubMedQA Accuracy: Achieves an accuracy of 0.4500 on the PubMedQA evaluation set, with a macro F1 score of 0.2069 and a weighted F1 score of 0.2793.
Training Details: Fine-tuned over 3 epochs with 900 training examples and 100 evaluation examples, using a learning rate of 5e-05.

Limitations

Medical MMLU Performance: The model shows limited performance on broader medical knowledge, with a Medical MMLU accuracy of 0.1600 across 50 samples. Specific subjects like clinical knowledge, college medicine, medical genetics, professional medicine, and virology showed 0.0000 accuracy in the provided breakdown.
Confusion Matrix Insights: The PubMedQA confusion matrix indicates a strong bias towards predicting 'yes', with 45 correct 'yes' predictions but 40 'no' and 15 'maybe' questions incorrectly classified as 'yes'. This suggests a need for careful interpretation of its 'no' and 'maybe' responses.

Ideal Use Cases

This model is best suited for:

Preliminary Medical Information Retrieval: Answering straightforward, fact-based questions from medical texts where a 'yes/no/maybe' answer is expected.
Biomedical Research Support: Assisting researchers in quickly sifting through medical literature for specific answers.
Specialized QA Systems: Integration into applications focused solely on PubMedQA-style medical inquiries.

Overview

Model Overview

Key Capabilities and Performance

Limitations

Ideal Use Cases

Full Model Card (README)