GanjinZero/wombat-7b-delta

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Apr 13, 2023Architecture:Transformer0.0K Cold

GanjinZero/wombat-7b-delta is a 7 billion parameter instruction-following language model developed by Alibaba DAMO Academy and Tsinghua University. It is fine-tuned from Alpaca models using the novel RRHF (Rank Response to align Human Feedback) method to align with human preferences, specifically using ChatGPT as a proxy. This model is primarily intended for research into learning from human feedback and serves as a prototype for RRHF methodologies.

Loading preview...

Model Overview

GanjinZero/wombat-7b-delta is a 7 billion parameter instruction-following language model, developed by Alibaba DAMO Academy and Tsinghua University. Released on April 13, 2023, it is fine-tuned from Alpaca models using a novel method called RRHF (Rank Response to align Human Feedback). This technique aligns the model with human preferences, using ChatGPT as a proxy for human feedback.

Key Capabilities

  • Instruction Following: Designed to follow instructions effectively.
  • Human Feedback Alignment: Utilizes the RRHF method for alignment, a key research focus.
  • Research Prototype: Serves as a prototype for the RRHF methodology.

Intended Use Cases

  • Research on Human Feedback: Primarily intended for researchers studying methods of learning from human feedback in language models.
  • RRHF Method Exploration: Ideal for exploring and understanding the RRHF fine-tuning approach.

Limitations

  • Not for Production: This model is not fine-tuned for production systems and is not intended for commercial use.
  • No Competition with OpenAI API: Usage must not compete with the OpenAI API.