kaist-ai/selfee-13b-delta
TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kPublished:May 31, 2023License:cc-by-nc-4.0Architecture:Transformer0.0K Open Weights Cold

SelFee-13B-Delta is a 13 billion parameter instruction-following LLaMA model developed by KAIST AI, fine-tuned with a unique iterative self-revising mechanism. It generates and incorporates self-feedback to refine its answers, aiming for higher quality responses. This model is designed for tasks requiring nuanced instruction following and iterative refinement, operating within a 4096-token context window.

Loading preview...

Overview of SelFee-13B-Delta

SelFee-13B-Delta, developed by KAIST AI, is a 13 billion parameter instruction-following LLaMA model. Its core innovation lies in its iterative self-revising mechanism, where the model generates self-feedback and uses it to refine its own answers. This process allows the model to autonomously decide when to stop generating revisions, or to be enforced to revise a set number of times for potentially improved performance.

Key Capabilities & Features

  • Self-Feedback Generation: The model is trained to generate iterative feedback and revisions, enhancing response quality.
  • Autonomous Inference Mode: SelFee can automatically terminate generation when its self-feedback indicates no further revision is needed.
  • Revision Enforce Mode: Users can enforce a minimum number of revisions, which has been observed to correlate with increased performance.
  • Data Augmentation: Training data was augmented using OpenAI API calls, involving a three-step process of initial answer generation, feedback collection, and iterative revision until satisfaction or token limits were met.
  • LLaMA-based Architecture: Built upon the LLaMA model, leveraging its foundational capabilities.

Training and Evaluation

SelFee was fine-tuned using FastChat on a diverse dataset including Stanford Alpaca, math, code, Flan, and ShareGPT collections. Evaluation was conducted on 80 diverse queries using GPT-4 as an evaluator, employing a bidirectional evaluation setting to mitigate positional bias. While performance on the Vicuna setting reportedly outperforms ChatGPT, the developers note limitations in the evaluation's scope and reliability, particularly for complex tasks like math, reasoning, factuality, and coding.

Use Cases

This model is particularly suited for applications requiring high-quality, iteratively refined responses to instructions, where the ability to self-correct and improve output is beneficial. Its unique training methodology makes it a strong candidate for tasks demanding nuanced understanding and precise execution, especially when iterative refinement can lead to better outcomes.