NingLab/CASLIE-S

TEXT GENERATIONConcurrency Cost:1Model Size:3.2BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Oct 21, 2024License:cc-by-4.0Architecture:Transformer Open Weights Cold

NingLab/CASLIE-S is a 3.2 billion parameter instruction-tuned model developed by NingLab, based on the Llama-3.2-3B-Instruct architecture. It is specifically designed for e-commerce applications, leveraging high-quality multimodal instruction data to generalize foundation models. This model excels at tasks where captions and image context are crucial for understanding e-commerce related queries.

Loading preview...

CASLIE-S: E-commerce Optimized Multimodal Instruction Model

CASLIE-S is a 3.2 billion parameter instruction-tuned model developed by NingLab, specifically designed to enhance foundation models for e-commerce applications. It is built upon the Llama-3.2-3B-Instruct base model, indicating its strong language understanding capabilities.

Key Capabilities

  • E-commerce Specialization: Optimized for tasks within the e-commerce domain, leveraging a unique approach where "Captions Speak Louder than Images" (CASLIE).
  • Multimodal Instruction Tuning: Benefits from high-quality multimodal instruction data, enabling it to process and understand information where both textual captions and visual context are important.
  • Generalization: Aims to generalize foundation models for various e-commerce scenarios, suggesting adaptability across different product categories and user queries.

Good For

  • E-commerce AI applications: Ideal for developers building AI solutions that require a deep understanding of product descriptions, user queries, and visual information in an e-commerce context.
  • Research in multimodal learning: Useful for researchers exploring the integration of textual and visual data, particularly in specialized domains like e-commerce.

This model is a result of the research detailed in the paper "Captions Speak Louder than Images (CASLIE): Generalizing Foundation Models for E-commerce from High-quality Multimodal Instruction Data" by Ling et al. (2024).