Docs /Getting Started//account/concurrency/stream

/account/concurrency/stream

Live stream of in-flight inference requests and concurrency usage for your organization.

`/account/concurrency/stream`

The concurrency stream endpoint publishes a live view of the inference requests currently running against your organization's concurrency allotment. It is designed for dashboards, autoscalers and any client that needs near-real-time visibility into how much of the plan's concurrent-request budget is in use.

HTTP request

GET https://api.featherless.ai/account/concurrency/stream

Authentication

Authenticate with either an API key via the Authorization header, or with an active browser session cookie. The returned usage is always scoped to the authenticated user's active organization.

Streaming format

The response is a Server-Sent Events (SSE) stream with content type text/event-stream. The server emits one event immediately on connect and then one event every two seconds thereafter, in the standard data: {…}\n\n frame format. The stream remains open until the client disconnects.

Response schema

{
  "limit": 8,
  "used_cost": 4,
  "request_count": 1,
  "requests": [
    {
      "id": "5c4e82ea-2023-46ac-b6a5-025cf2f312fe",
      "cost": 4,
      "model": "moonshotai/Kimi-K2.6",
      "started_at": 1776783127496,
      "duration_ms": 4084
    }
  ]
}

Top-level fields

Field	Type	Description
`limit`	integer \| null	The organization's concurrency allotment for the active plan. `null` means unlimited (no concurrency cap).
`used_cost`	integer	Sum of the concurrency cost of all in-flight requests. A new request is rejected with HTTP 429 if it would push `used_cost` above `limit`.
`request_count`	integer	Number of requests currently in flight for the organization.
`requests`	array	One entry per in-flight request. See the table below.

requests[] fields

Field	Type	Description
`id`	string	Unique identifier for the in-flight request.
`cost`	integer	Concurrency cost of this request, derived from the model size. See Concurrency Limits for the size-to-cost mapping.
`model`	string	Identifier of the model handling the request.
`started_at`	integer	Unix epoch milliseconds when the request began.
`duration_ms`	integer	Milliseconds elapsed since the request started, recomputed on every frame. This value grows across successive events while the request is still in flight.

Example

curl --no-buffer https://api.featherless.ai/account/concurrency/stream \
  -H "Accept: text/event-stream" \
  -H "Authorization: Bearer $FEATHERLESS_API_KEY"

Example frames emitted by the server (one every two seconds):

data: {"limit":8,"used_cost":0,"request_count":0,"requests":[]}

data: {"limit":8,"used_cost":4,"request_count":1,"requests":[{"id":"5c4e82ea-2023-46ac-b6a5-025cf2f312fe","cost":4,"model":"moonshotai/Kimi-K2.6","started_at":1776783127496,"duration_ms":82}]}

data: {"limit":8,"used_cost":4,"request_count":1,"requests":[{"id":"5c4e82ea-2023-46ac-b6a5-025cf2f312fe","cost":4,"model":"moonshotai/Kimi-K2.6","started_at":1776783127496,"duration_ms":2083}]}

data: {"limit":8,"used_cost":0,"request_count":0,"requests":[]}

Snapshot endpoint

For one-off polling instead of a long-lived stream, use GET /account/concurrency. It returns the same payload as a single SSE frame, as regular JSON, and closes the connection immediately.

Last edited: Apr 22, 2026