Plans and Concurrency Limits

Explaining how subscription tiers translate to concurrent inference call maximums.

Unlike most serverless offerings that bill by tokens (and create in monthly billed amounts) Featherless plans are effectively reservations of capacity, deteriming the maximum size and concurrency of accessible models.

Individual plans are designed to enable use of even the largest models in our catalogue, and concurrent connections of smaller models, whereas businesses plans are designed to enable easy scaling from 2 concurrent requests to infinity.

Model Concurrency Costs

Each model has concurrency cost based on its size

*Model Sizes*	*Concurrency Cost*	*Example Models*
7B to 15B	1	Qwen 2.5 7B, Llama2 13B
24B to 34B	2	Qwen 32B coder, Mistral 3 24B
70B and 72B	4	Llama 3.3 70B, Qwen 2.5 72B
Deepseek & Kimi-K1	4 (individual plans only)	Deepseek v3, R1 & Kimi-K2

Plans and Allotments

Feather Basic

Concurrency Allotted: 2
Maximum model size: 15B
What You Can Run:
- 2 concurrent requests to any model ≤15B

Feather Premium

Concurrency Allotted: 4
Maximum model size: no limit (i.e. deepseek v3,r1 & Kimi-K2 included)
What You Can Run:
- 4 concurrent requests to models ≤15B, OR
- 2 concurrent requests to models ≤34B, OR
- 1 concurrent request to any model ≥70B, OR
- Any combination with concurrency cost up to 4

Feather Scale

Concurrency Allotted: 8 per scale unit
Maximum Model Size: 72B (i.e. deepseek v3 and r1 currently excluded)
What You Can Run:
- 8 concurrent requests to models ≤15B, OR
- 4 concurrent requests to models ≤34B, OR
- 2 concurrent requests to models ≤70B, OR
- Any combination with concurrency cost up to 8

Examples

Feather Premium Examples

Example 1: Run 4 simultaneous requests to Qwen2.5-7B (≤15B)
Example 2: Run 2 simultaneous requests to Qwen2.5-32B (24-32B)
Example 3: Run 1 simultaneous request to Qwen2.5-72B (≥70B)
Example 4 (Mixed): Run 1 Qwen2.5-32B (cost 2) + 2 Qwen2.5-7B (cost 1 each) = Total cost 4

Feather Scale Examples

Example 1: Run 8 simultaneous requests to Qwen2.5-7B (≤15B)
Example 2: Run 4 simultaneous requests to Qwen2.5-32B (24-32B)
Example 3: Run 2 simultaneous requests to Qwen2.5-72B (≥70B)
Example 4 (Mixed): Run 1 Qwen2.5-72B (cost 4) + 2 Qwen2.5-32B (cost 2 each) = Total weight 8

Combinations

Your total concurrency usage is calculated by adding the concurrency cost of all models being inferenced simultaneously. For example:

1 model with cost 2 + 2 models with cost 1 = Total concurrency cost 4
1 model with cost 4 + 2 models with cost 2 = Total concurrency cost 8

Requests that would take the total concurrency cost above the subscriber’s purchased allotment are rejected with HTTP status code 429.

Last edited: Jul 16, 2025