Name: nnethercott/llava-v1.5-7b-hf-vicuna API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: nnethercott

Model Overview

nnethercott/llava-v1.5-7b-hf-vicuna is a 7 billion parameter vision-language model (VLM) derived from llava-hf/llava-1.5-7b-hf. It is an auto-regressive language model built upon the transformer architecture, specifically fine-tuned from LLaMA/Vicuna. This model's primary motivation is to facilitate LLM benchmarking, particularly for tasks requiring multimodal understanding.

Key Capabilities

Multimodal Instruction Following: LLaVA is trained to follow instructions that involve both text and images, enabling it to understand and generate responses in a multimodal context.
Vision-Language Integration: It combines the power of large language models with visual processing capabilities, allowing for richer interactions.
Benchmarking Foundation: The model is provided to support comprehensive evaluation of multimodal AI systems.

Training Details

The model was fine-tuned using a diverse dataset, including:

558K filtered image-text pairs from LAION/CC/SBU, captioned by BLIP.
158K GPT-generated multimodal instruction-following data.
450K academic-task-oriented VQA data mixture.
40K ShareGPT data.

Performance Highlights

Evaluated on the Open LLM Leaderboard, nnethercott/llava-v1.5-7b-hf-vicuna achieved an average score of 52.28. Notable scores include:

HellaSwag (10-Shot): 76.09
AI2 Reasoning Challenge (25-Shot): 52.65
MMLU (5-Shot): 51.68

Good for

Multimodal AI Research: Ideal for researchers and developers exploring vision-language models and their applications.
Benchmarking: Suitable for evaluating the performance of LLMs on tasks that require understanding both visual and textual inputs.
Instruction-Following Tasks: Can be used for tasks where the model needs to interpret and act upon multimodal instructions.

Overview

Model Overview

Key Capabilities

Training Details

Performance Highlights

Good for

Full Model Card (README)