Name: NECOUDBFM/Jellyfish-8B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: NECOUDBFM

Overview

NECOUDBFM/Jellyfish-8B is an 8 billion parameter large language model, fine-tuned from Meta-Llama-3-8B-Instruct by Haochen Zhang, Yuyang Dong, Chuan Xiao, and Masafumi Oyamada. Developed with funding from NEC Corporation and Osaka University, this model is specifically designed for data preprocessing tasks. It is part of a family of Jellyfish models, with other sizes including Jellyfish-7B and Jellyfish-13B.

Key Capabilities

Error Detection: Identifies errors in specific attribute values within records, including spelling errors, inconsistencies, or illogical values.
Data Imputation: Infers missing attribute values based on available information within a record.
Schema Matching: Determines semantic equivalence between two attributes (columns) for table merging.
Entity Matching: Compares two records to determine if they represent the same entity.
Column Type Annotation & Attribute Value Extraction: Also performs well on these unseen tasks, as detailed in the benchmarks.

Performance Highlights

The Jellyfish-8B model shows competitive performance against models like GPT-3.5 and GPT-4 on various data preprocessing benchmarks. For instance, on Entity Matching tasks like Amazon-Google, Jellyfish-8B achieves 81.42% F1, and on Beer, it reaches 100% F1. While its performance varies across tasks, it often provides strong results, particularly in seen data imputation and entity matching scenarios. The model was trained using LoRA, targeting q_proj, k_proj, v_proj, and o_proj modules for efficient training.

When to Use

Jellyfish-8B is ideal for applications requiring automated data cleaning and preparation, especially for structured and semi-structured data. Its specialized fine-tuning makes it a strong candidate for tasks like ensuring data quality, integrating datasets, and preparing data for further analysis or machine learning. Users are recommended to run Jellyfish with vLLM for accelerated inference.

Overview

Overview

Key Capabilities

Performance Highlights

When to Use

Full Model Card (README)