GanjinZero/wombat-7b-gpt4-delta

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Apr 13, 2023Architecture:Transformer0.0K Cold

GanjinZero/wombat-7b-gpt4-delta is a 7 billion parameter instruction-following language model developed by Alibaba DAMO Academy and Tsinghua University. Fine-tuned from Alpaca models using the novel RRHF (Rank Response to align Human Feedback) method, it is aligned with GPT-4 as a proxy for human preferences. This model is primarily intended for research into learning from human feedback and serves as a prototype for RRHF methodologies.

Loading preview...

Wombat-7B-GPT4-Delta: An RRHF-aligned Instruction Model

GanjinZero/wombat-7b-gpt4-delta is a 7 billion parameter instruction-following language model developed by Alibaba DAMO Academy and Tsinghua University. Released on April 13, 2023, this model is fine-tuned from Alpaca using a novel method called RRHF (Rank Response to align Human Feedback), which aligns the model with GPT-4 as a proxy for human preferences.

Key Characteristics

  • Architecture: Fine-tuned from Alpaca models.
  • Alignment Method: Utilizes RRHF for instruction-following alignment.
  • Training Data: Based on the GPT-4-LLM dataset.
  • Research Focus: Primarily intended for research into human feedback learning and as a prototype for RRHF methods.

Intended Use Cases

  • Research: Ideal for researchers in natural language processing, machine learning, and artificial intelligence exploring instruction-following and human feedback alignment techniques.
  • Prototyping: Serves as a prototype for the RRHF methodology.

Limitations

  • Non-Production: Not intended for use in production systems.
  • OpenAI API: Usage must not compete with the OpenAI API.

For more technical details, refer to the associated paper: RRHF: Rank Responses to Align Language Models with Human Feedback without tears.