ICBU-NPU/FashionGPT-70B-V1

TEXT GENERATIONConcurrency Cost:4Model Size:69BQuant:FP8Ctx Length:32kPublished:Sep 17, 2023License:llama2Architecture:Transformer0.0K Open Weights Cold

FashionGPT-70B-V1 by ICBU-NPU is a 69 billion parameter Llama-2-70B based model, enhanced with two adapters for improved performance. It was fine-tuned using a combination of Orca-style and Samantha datasets, focusing on multi-turn conversational data. The model achieves an average score of 73.26 across ARC, HellaSwag, MMLU, and TruthfulQA benchmarks, making it suitable for general conversational AI applications.

Loading preview...

FashionGPT-70B-V1 Overview

FashionGPT-70B-V1 is a 69 billion parameter language model developed by ICBU-NPU, built upon the Llama-2-70B architecture. A key differentiator of this model is its unique training approach, which involves combining two distinct adapters with the base Llama-2-70B model. This method, which the developers claim achieves better performance than using a single adapter, will be detailed in an upcoming paper.

Key Capabilities & Training

  • Adapter-based Fine-tuning: Utilizes two adapters trained with a forked QLoRA repository, allowing for efficient fine-tuning of quantized LLMs.
  • Multi-turn Conversation Support: Enhanced with multi-turn conversational data support adapted from the FastChat repository, making it proficient in dialogue-based interactions.
  • Diverse Training Data: Trained on a combination of datasets, including a filtered 40K subset of OpenOrca-GPT4 and airoboros-gpt4-1.4.1, alongside 6.5K cleaned samples from the Samantha dataset.
  • Performance Benchmarks: Achieves competitive scores across standard benchmarks:
    • ARC (25-shot): 71.08
    • HellaSwag (10-shot): 87.32
    • MMLU (5-shot): 70.70
    • TruthfulQA (0-shot): 63.92
    • Average: 73.26

Good For

  • General Conversational AI: Its training on multi-turn data and diverse datasets makes it well-suited for chatbot applications and interactive dialogue systems.
  • Research into Adapter Merging: Developers interested in the novel approach of combining multiple adapters for performance gains may find this model and its upcoming paper valuable.
  • Applications requiring Llama-2-70B base: Benefits from the robust foundation of the Llama-2-70B model while offering specialized fine-tuning.