Docs /Getting Started//account/concurrency/stream

/account/concurrency/stream

Live stream of in-flight inference requests and concurrency usage for your organization.

/account/concurrency/stream

The concurrency stream endpoint publishes a live view of the inference requests currently running against your organization's concurrency allotment. It is designed for dashboards, autoscalers and any client that needs near-real-time visibility into how much of the plan's concurrent-request budget is in use.

HTTP request

GET https://api.featherless.ai/account/concurrency/stream

Authentication

Authenticate with either an API key via the Authorization header, or with an active browser session cookie. The returned usage is always scoped to the authenticated user's active organization.

Streaming format

The response is a Server-Sent Events (SSE) stream with content type text/event-stream. The server emits one event immediately on connect and then one event every two seconds thereafter, in the standard data: {…}\n\n frame format. The stream remains open until the client disconnects.

Response schema

{
  "limit": 8,
  "used_cost": 4,
  "request_count": 1,
  "requests": [
    {
      "id": "5c4e82ea-2023-46ac-b6a5-025cf2f312fe",
      "cost": 4,
      "model": "moonshotai/Kimi-K2.6",
      "started_at": 1776783127496,
      "duration_ms": 4084
    }
  ]
}

Top-level fields

Field

Type

Description

limit

integer | null

The organization's concurrency allotment for the active plan. null means unlimited (no concurrency cap).

used_cost

integer

Sum of the concurrency cost of all in-flight requests. A new request is rejected with HTTP 429 if it would push used_cost above limit.

request_count

integer

Number of requests currently in flight for the organization.

requests

array

One entry per in-flight request. See the table below.

requests[] fields

Field

Type

Description

id

string

Unique identifier for the in-flight request.

cost

integer

Concurrency cost of this request, derived from the model size. See Concurrency Limits for the size-to-cost mapping.

model

string

Identifier of the model handling the request.

started_at

integer

Unix epoch milliseconds when the request began.

duration_ms

integer

Milliseconds elapsed since the request started, recomputed on every frame. This value grows across successive events while the request is still in flight.

Example

curl --no-buffer https://api.featherless.ai/account/concurrency/stream \
  -H "Accept: text/event-stream" \
  -H "Authorization: Bearer $FEATHERLESS_API_KEY"

Example frames emitted by the server (one every two seconds):

data: {"limit":8,"used_cost":0,"request_count":0,"requests":[]}

data: {"limit":8,"used_cost":4,"request_count":1,"requests":[{"id":"5c4e82ea-2023-46ac-b6a5-025cf2f312fe","cost":4,"model":"moonshotai/Kimi-K2.6","started_at":1776783127496,"duration_ms":82}]}

data: {"limit":8,"used_cost":4,"request_count":1,"requests":[{"id":"5c4e82ea-2023-46ac-b6a5-025cf2f312fe","cost":4,"model":"moonshotai/Kimi-K2.6","started_at":1776783127496,"duration_ms":2083}]}

data: {"limit":8,"used_cost":0,"request_count":0,"requests":[]}

Snapshot endpoint

For one-off polling instead of a long-lived stream, use GET /account/concurrency. It returns the same payload as a single SSE frame, as regular JSON, and closes the connection immediately.

Last edited: Apr 22, 2026