Run Tulu-v2.5-ppo-13B-stackexchange-60K API (Easy Deployment & Flat-Rate Pricing)

Name: allenai/tulu-v2.5-ppo-13b-stackexchange-60k API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: allenai

Tulu V2.5 PPO 13B - StackExchange 60K Overview

This model is a 13 billion parameter language model from the Tulu V2.5 series, developed by AllenAI. It is fine-tuned from meta-llama/Llama-2-13b-hf and specifically aligned using Proximal Policy Optimization (PPO). The training utilized a 60k random subsample of the StackExchange dataset, employing a dedicated 13B Reward Model also trained on this dataset.

Key Characteristics

Base Model: Fine-tuned from Llama-2-13b-hf.
Alignment Method: Utilizes PPO for alignment, building upon the Tulu 2 suite's DPO and PPO training.
Specialized Training Data: Aligned on a 60k subsample of the StackExchange dataset, making it particularly relevant for technical Q&A and assistant-like interactions.
Input Format: Requires a specific chat template for optimal performance:
```
<|user|>
Your message here!
<|assistant|>
```
It is crucial to include a newline after <|assistant|>.
License: Apache 2.0.

Intended Uses & Limitations

This model is designed to act as a helpful assistant, especially for tasks related to the kind of information found on StackExchange. It was initially fine-tuned on a diverse mix of human-created instructions and synthetic dialogues. However, it's important to note that the Tulu models have not been aligned for safety in the RLHF phase, meaning they can produce problematic outputs if prompted to do so. Users should be aware of potential biases and risks inherent in models trained on broad web data.

Overview

Tulu V2.5 PPO 13B - StackExchange 60K Overview

Key Characteristics

Intended Uses & Limitations

Full Model Card (README)