allenai/tulu-v2.5-ppo-13b-stackexchange-60k
The allenai/tulu-v2.5-ppo-13b-stackexchange-60k model is a 13 billion parameter language model developed by AllenAI, fine-tuned from Llama-2-13b-hf. It is part of the Tulu V2.5 series, specifically aligned using Proximal Policy Optimization (PPO) on a 60k subsample of the StackExchange dataset. This model is designed to function as a helpful assistant, with a particular focus on generating responses relevant to StackExchange-like queries.
Loading preview...
Tulu V2.5 PPO 13B - StackExchange 60K Overview
This model is a 13 billion parameter language model from the Tulu V2.5 series, developed by AllenAI. It is fine-tuned from meta-llama/Llama-2-13b-hf and specifically aligned using Proximal Policy Optimization (PPO). The training utilized a 60k random subsample of the StackExchange dataset, employing a dedicated 13B Reward Model also trained on this dataset.
Key Characteristics
- Base Model: Fine-tuned from Llama-2-13b-hf.
- Alignment Method: Utilizes PPO for alignment, building upon the Tulu 2 suite's DPO and PPO training.
- Specialized Training Data: Aligned on a 60k subsample of the StackExchange dataset, making it particularly relevant for technical Q&A and assistant-like interactions.
- Input Format: Requires a specific chat template for optimal performance:
It is crucial to include a newline after<|user|> Your message here! <|assistant|><|assistant|>. - License: Apache 2.0.
Intended Uses & Limitations
This model is designed to act as a helpful assistant, especially for tasks related to the kind of information found on StackExchange. It was initially fine-tuned on a diverse mix of human-created instructions and synthetic dialogues. However, it's important to note that the Tulu models have not been aligned for safety in the RLHF phase, meaning they can produce problematic outputs if prompted to do so. Users should be aware of potential biases and risks inherent in models trained on broad web data.