NECOUDBFM/Jellyfish-13B

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kPublished:Oct 16, 2023License:cc-by-nc-4.0Architecture:Transformer0.0K Open Weights Cold

NECOUDBFM/Jellyfish-13B is a 13 billion parameter large language model developed by Haochen Zhang, Yuyang Dong, Chuan Xiao, and Masafumi Oyamada, fine-tuned from Open-Orca/OpenOrca-Platypus2-13B. It is specifically tailored for data preprocessing tasks, including error detection, data imputation, schema matching, and entity matching. The model offers cost-effective local execution and maintains strong performance in general NLP tasks. Jellyfish-13B is designed to deliver precise, straightforward answers suitable for integration into data management systems.

Loading preview...

Jellyfish-13B: Specialized for Data Preprocessing

Jellyfish-13B is a 13 billion parameter large language model developed by Haochen Zhang, Yuyang Dong, Chuan Xiao, and Masafumi Oyamada, fine-tuned from the Open-Orca/OpenOrca-Platypus2-13B model. Its core specialization lies in data preprocessing tasks, offering competitive performance against state-of-the-art algorithms and larger LLMs like GPT-3.5 and GPT-4.

Key Capabilities

  • Data Preprocessing: Excels in error detection, data imputation, schema matching, and entity matching.
  • Cost-Effective: As a 13B model, it enables local execution without compromising data security.
  • Dual Versions: Available in two distinct versions:
    • Jellyfish-13B (main branch): Designed for precise, straightforward answers, ideal for integration into data management systems where responses can be easily transformed into code.
    • Jellyfish-13B-Interpreter (alternative branch): Fine-tuned with reasoning and sequential thought processes, distilling knowledge from GPT-4, making it more user-oriented with in-depth data insights.
  • Strong NLP Performance: Maintains robust performance in general NLP tasks, as evidenced by benchmark comparisons.

Performance Highlights

Jellyfish-13B demonstrates strong performance across various data preprocessing tasks, often rivaling or surpassing larger models and specialized algorithms. For instance, it achieved 99.33% on Error Detection (Adult dataset) and 100% on Data Imputation (Buy dataset). On average, it achieved 86.02% across a suite of seen data preprocessing tasks, outperforming GPT-4's 84.17%.

Good for

  • Data Management Systems: Jellyfish-13B's precise responses are well-suited for automated data cleaning and preparation pipelines.
  • Data Analysts & Scientists: Jellyfish-13B-Interpreter provides detailed insights for users without advanced coding skills.
  • Local Deployment: Its 13B size allows for efficient local execution, addressing data security and cost concerns.

For more details, refer to the Jellyfish paper.