InferenceIllusionist/Mistral-RealworldQA-v0.2-7b-SFT
InferenceIllusionist's Mistral-RealworldQA-v0.2-7b-SFT is a 7 billion parameter Mistral-based vision-language model, fine-tuned to reduce hallucinations in Visual Question Answering (VQA) tasks. It specializes in generating concise image captions and providing factual answers to visual queries, leveraging a 4096-token context length. This model is optimized for use cases requiring short, factual descriptions of images with a drier, less conversational tone.
Loading preview...
Model Overview
InferenceIllusionist's Mistral-RealworldQA-v0.2-7b-SFT is a 7 billion parameter vision-language model built upon the Mistral-7b-02 base. This model is the second iteration in a series of experiments focused on fine-tuning for image captioning and aims to significantly reduce hallucinations in Visual Question Answering (VQA) tasks. It was fine-tuned using the RealWorldQA dataset, originally provided by the X.Ai Team.
Key Capabilities & Characteristics
- Reduced Hallucinations: Specifically designed to minimize inaccurate outputs when answering questions about images.
- Concise Output: Provides shorter, less verbose responses for image-related queries, ideal for low token count requirements.
- Vision Functionality: Requires an additional
mmprojfile for vision capabilities, with both quantized (197MB) and unquantized (596MB) options available for quality and VRAM considerations. - Drier Tone: Lacks the conversational prose of other models, offering a more direct and factual communication style.
- Alpaca Prompt Format: Optimized for best results when using the Alpaca prompt format.
Use Cases
This model is best suited for applications requiring:
- Image Captioning: Generating brief and accurate descriptions for images.
- Factual VQA: Answering direct questions about visual content where conciseness and accuracy are prioritized over conversational fluency.
Technical Details
The model was fine-tuned from mistral-community/Mistral-7B-v0.2 and utilized Unsloth and Huggingface's TRL library for faster training.