Plans and Concurrency Limits
Explaining how subscription tiers translate to concurrent inference call maximums.
Unlike most serverless offerings that bill by tokens (and create in monthly billed amounts) Featherless plans are effectively reservations of capacity, deteriming the maximum size and concurrency of accessible models.
Individual plans are designed to enable use of even the largest models in our catalogue, and concurrent connections of smaller models, whereas businesses plans are designed to enable easy scaling from 2 concurrent requests to infinity.
Model Concurrency Costs
Each model has concurrency cost based on its size
Model Sizes | Concurrency Cost | Example Models |
7B to 15B | 1 | Qwen 2.5 7B, Llama2 13B |
24B to 34B | 2 | Qwen 32B coder, Mistral 3 24B |
70B and 72B | 4 | Llama 3.3 70B, Qwen 2.5 72B |
Deepseek v3 and R1 | 4 (individual plans only) | Deepseek v3 and R1 |
Plans and Allotments
Feather Basic
Concurrency Allotted: 2
Maximum model size: 15B
What You Can Run:
2 concurrent requests to any model ≤15B
Feather Premium
Concurrency Allotted: 4
Maximum model size: no limit (i.e. deepseek v3 and r1 included)
What You Can Run:
4 concurrent requests to models ≤15B, OR
2 concurrent requests to models ≤34B, OR
1 concurrent request to any model ≥70B, OR
Any combination with concurrency cost up to 4
Feather Scale
Concurrency Allotted: 8 per scale unit
Maximum Model Size: 72B (i.e. deepseek v3 and r1 currently excluded)
What You Can Run:
8 concurrent requests to models ≤15B, OR
4 concurrent requests to models ≤34B, OR
2 concurrent requests to models ≤70B, OR
Any combination with concurrency cost up to 8
Examples
Feather Premium Examples
Example 1: Run 4 simultaneous requests to Qwen2.5-7B (≤15B)
Example 2: Run 2 simultaneous requests to Qwen2.5-32B (24-32B)
Example 3: Run 1 simultaneous request to Qwen2.5-72B (≥70B)
Example 4 (Mixed): Run 1 Qwen2.5-32B (cost 2) + 2 Qwen2.5-7B (cost 1 each) = Total cost 4
Feather Scale Examples
Example 1: Run 8 simultaneous requests to Qwen2.5-7B (≤15B)
Example 2: Run 4 simultaneous requests to Qwen2.5-32B (24-32B)
Example 3: Run 2 simultaneous requests to Qwen2.5-72B (≥70B)
Example 4 (Mixed): Run 1 Qwen2.5-72B (cost 4) + 2 Qwen2.5-32B (cost 2 each) = Total weight 8
Combinations
Your total concurrency usage is calculated by adding the concurrency cost of all models being inferenced simultaneously. For example:
1 model with cost 2 + 2 models with cost 1 = Total concurrency cost 4
1 model with cost 4 + 2 models with cost 2 = Total concurrency cost 8
Requests that would take the total concurrency cost above the subscriber’s purchased allotment are rejected with HTTP status code 429.