allenai/tulu-v2.5-ppo-13b-nectar-60k

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kPublished:Jun 11, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

The allenai/tulu-v2.5-ppo-13b-nectar-60k model is a 13 billion parameter language model developed by AllenAI, fine-tuned from Llama-2-13b-hf. It is part of the Tulu V2.5 series, trained using PPO on a 60k subsample of the Nectar dataset, specifically designed to function as a helpful assistant. This model focuses on learning from preference feedback, leveraging a reward model trained on the same Nectar split to enhance its conversational capabilities.

Loading preview...

Overview

allenai/tulu-v2.5-ppo-13b-nectar-60k is a 13 billion parameter language model developed by AllenAI, building upon the meta-llama/Llama-2-13b-hf base model. It is a member of the Tulu V2.5 suite, which emphasizes training with DPO and PPO from preference feedback. This specific model was fine-tuned using PPO on a 60,000-sample subset of the Nectar dataset, utilizing a dedicated 13B reward model for alignment.

Key Capabilities

  • Helpful Assistant: Designed and trained to act as a conversational assistant.
  • Preference Learning: Leverages Proximal Policy Optimization (PPO) with a reward model for improved alignment based on preference feedback.
  • Instruction Following: Initially fine-tuned on a diverse mix of human-created instructions and synthetic dialogues from the Tulu V2 dataset.
  • Specific Input Format: Optimized for a <|user|> and <|assistant|> chat template, requiring a newline after <|assistant|> for optimal generation quality.

Good For

Limitations

  • The model has not been explicitly aligned for safety during the RLHF phase and lacks in-the-loop filtering, meaning it can produce problematic outputs if prompted to do so.