Name: nnethercott/llava-v1.5-7b_vicuna API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: nnethercott

nnethercott/llava-v1.5-7b_vicuna: A Multimodal Instruction-Following Model

This model is a 7 billion parameter LLaVA variant, fine-tuned from liuhaotian/llava-v1.5-7b, specifically prepared for LLM benchmarking. It is an auto-regressive language model built on the transformer architecture, leveraging the LLaMA/Vicuna base.

Key Capabilities

Multimodal Instruction Following: Trained on GPT-generated multimodal instruction-following data, enabling it to process and respond to instructions involving both text and images.
Vision Integration: Inherits LLaVA's ability to understand visual inputs, making it suitable for tasks like Visual Question Answering (VQA).
Training Data: Fine-tuned using a diverse dataset including 558K filtered image-text pairs, 158K GPT-generated multimodal instruction data, 450K academic-task-oriented VQA data, and 40K ShareGPT data.

Benchmarking Performance

Evaluated on the Open LLM Leaderboard, the model achieved an average score of 52.28. Notable scores include 76.09 on HellaSwag (10-Shot) and 72.06 on Winogrande (5-shot), indicating its proficiency in common sense reasoning. Its MMLU (5-Shot) score is 51.68, and GSM8k (5-shot) is 15.31.

Good For

LLM Benchmarking: Ideal for evaluating multimodal capabilities and instruction following in a 7B parameter model.
Research and Development: Useful for exploring multimodal AI applications based on the LLaVA architecture.
Multimodal Understanding Tasks: Applicable for tasks that require processing and generating responses from combined visual and textual information.

Overview

nnethercott/llava-v1.5-7b_vicuna: A Multimodal Instruction-Following Model

Key Capabilities

Benchmarking Performance

Good For

Full Model Card (README)