Plans and Concurrency Limits

Explaining how subscription tiers translate to concurrent inference call maximums.

Unlike most serverless offerings that bill by tokens (and create in monthly billed amounts) Featherless plans are effectively reservations of capacity, deteriming the maximum size and concurrency of accessible models.

Individual plans are designed to enable use of even the largest models in our catalogue, and concurrent connections of smaller models, whereas businesses plans are designed to enable easy scaling from 2 concurrent requests to infinity.

Model Concurrency Costs

Each model has concurrency cost based on its size

Model Sizes

Concurrency Cost

Example Models

7B to 15B

1

Qwen 2.5 7B, Llama2 13B

24B to 34B

2

Qwen 32B coder, Mistral 3 24B

70B and 72B

4

Llama 3.3 70B, Qwen 2.5 72B

Deepseek v3 and R1

4 (individual plans only)

Deepseek v3 and R1


Plans and Allotments

Feather Basic

  • Concurrency Allotted: 2

  • Maximum model size: 15B

  • What You Can Run:

    • 2 concurrent requests to any model ≤15B

Feather Premium

  • Concurrency Allotted: 4

  • Maximum model size: no limit (i.e. deepseek v3 and r1 included)

  • What You Can Run:

    • 4 concurrent requests to models ≤15B, OR

    • 2 concurrent requests to models ≤34B, OR

    • 1 concurrent request to any model ≥70B, OR

    • Any combination with concurrency cost up to 4

Feather Scale

  • Concurrency Allotted: 8 per scale unit

  • Maximum Model Size: 72B (i.e. deepseek v3 and r1 currently excluded)

  • What You Can Run:

    • 8 concurrent requests to models ≤15B, OR

    • 4 concurrent requests to models ≤34B, OR

    • 2 concurrent requests to models ≤70B, OR

    • Any combination with concurrency cost up to 8


Examples

Feather Premium Examples

  1. Example 1: Run 4 simultaneous requests to Qwen2.5-7B (≤15B)

  2. Example 2: Run 2 simultaneous requests to Qwen2.5-32B (24-32B)

  3. Example 3: Run 1 simultaneous request to Qwen2.5-72B (≥70B)

  4. Example 4 (Mixed): Run 1 Qwen2.5-32B (cost 2) + 2 Qwen2.5-7B (cost 1 each) = Total cost 4

Feather Scale Examples

  1. Example 1: Run 8 simultaneous requests to Qwen2.5-7B (≤15B)

  2. Example 2: Run 4 simultaneous requests to Qwen2.5-32B (24-32B)

  3. Example 3: Run 2 simultaneous requests to Qwen2.5-72B (≥70B)

  4. Example 4 (Mixed): Run 1 Qwen2.5-72B (cost 4) + 2 Qwen2.5-32B (cost 2 each) = Total weight 8


Combinations

Your total concurrency usage is calculated by adding the concurrency cost of all models being inferenced simultaneously. For example:

  • 1 model with cost 2 + 2 models with cost 1 = Total concurrency cost 4

  • 1 model with cost 4 + 2 models with cost 2 = Total concurrency cost 8

Requests that would take the total concurrency cost above the subscriber’s purchased allotment are rejected with HTTP status code 429.