allenai/tulu-v2.5-ppo-13b-stackexchange-60k

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kPublished:Jun 11, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

The allenai/tulu-v2.5-ppo-13b-stackexchange-60k model is a 13 billion parameter language model developed by AllenAI, fine-tuned from Llama-2-13b-hf. It is part of the Tulu V2.5 series, specifically aligned using Proximal Policy Optimization (PPO) on a 60k subsample of the StackExchange dataset. This model is designed to function as a helpful assistant, with a particular focus on generating responses relevant to StackExchange-like queries.

Loading preview...

Tulu V2.5 PPO 13B - StackExchange 60K Overview

This model is a 13 billion parameter language model from the Tulu V2.5 series, developed by AllenAI. It is fine-tuned from meta-llama/Llama-2-13b-hf and specifically aligned using Proximal Policy Optimization (PPO). The training utilized a 60k random subsample of the StackExchange dataset, employing a dedicated 13B Reward Model also trained on this dataset.

Key Characteristics

  • Base Model: Fine-tuned from Llama-2-13b-hf.
  • Alignment Method: Utilizes PPO for alignment, building upon the Tulu 2 suite's DPO and PPO training.
  • Specialized Training Data: Aligned on a 60k subsample of the StackExchange dataset, making it particularly relevant for technical Q&A and assistant-like interactions.
  • Input Format: Requires a specific chat template for optimal performance:
    <|user|>
    Your message here!
    <|assistant|>
    It is crucial to include a newline after <|assistant|>.
  • License: Apache 2.0.

Intended Uses & Limitations

This model is designed to act as a helpful assistant, especially for tasks related to the kind of information found on StackExchange. It was initially fine-tuned on a diverse mix of human-created instructions and synthetic dialogues. However, it's important to note that the Tulu models have not been aligned for safety in the RLHF phase, meaning they can produce problematic outputs if prompted to do so. Users should be aware of potential biases and risks inherent in models trained on broad web data.