/account/concurrency/stream
Live stream of in-flight inference requests and concurrency usage for your organization.
/account/concurrency/stream
The concurrency stream endpoint publishes a live view of the inference requests currently running against your organization's concurrency allotment. It is designed for dashboards, autoscalers and any client that needs near-real-time visibility into how much of the plan's concurrent-request budget is in use.
HTTP request
GET https://api.featherless.ai/account/concurrency/stream
Authentication
Authenticate with either an API key via the Authorization header, or with an active browser session cookie. The returned usage is always scoped to the authenticated user's active organization.
Streaming format
The response is a Server-Sent Events (SSE) stream with content type text/event-stream. The server emits one event immediately on connect and then one event every two seconds thereafter, in the standard data: {…}\n\n frame format. The stream remains open until the client disconnects.
Response schema
{
"limit": 8,
"used_cost": 4,
"request_count": 1,
"requests": [
{
"id": "5c4e82ea-2023-46ac-b6a5-025cf2f312fe",
"cost": 4,
"model": "moonshotai/Kimi-K2.6",
"started_at": 1776783127496,
"duration_ms": 4084
}
]
}Top-level fields
Field | Type | Description |
|---|---|---|
| integer | null | The organization's concurrency allotment for the active plan. |
| integer | Sum of the concurrency cost of all in-flight requests. A new request is rejected with HTTP 429 if it would push |
| integer | Number of requests currently in flight for the organization. |
| array | One entry per in-flight request. See the table below. |
requests[] fields
Field | Type | Description |
|---|---|---|
| string | Unique identifier for the in-flight request. |
| integer | Concurrency cost of this request, derived from the model size. See Concurrency Limits for the size-to-cost mapping. |
| string | Identifier of the model handling the request. |
| integer | Unix epoch milliseconds when the request began. |
| integer | Milliseconds elapsed since the request started, recomputed on every frame. This value grows across successive events while the request is still in flight. |
Example
curl --no-buffer https://api.featherless.ai/account/concurrency/stream \
-H "Accept: text/event-stream" \
-H "Authorization: Bearer $FEATHERLESS_API_KEY"Example frames emitted by the server (one every two seconds):
data: {"limit":8,"used_cost":0,"request_count":0,"requests":[]}
data: {"limit":8,"used_cost":4,"request_count":1,"requests":[{"id":"5c4e82ea-2023-46ac-b6a5-025cf2f312fe","cost":4,"model":"moonshotai/Kimi-K2.6","started_at":1776783127496,"duration_ms":82}]}
data: {"limit":8,"used_cost":4,"request_count":1,"requests":[{"id":"5c4e82ea-2023-46ac-b6a5-025cf2f312fe","cost":4,"model":"moonshotai/Kimi-K2.6","started_at":1776783127496,"duration_ms":2083}]}
data: {"limit":8,"used_cost":0,"request_count":0,"requests":[]}Snapshot endpoint
For one-off polling instead of a long-lived stream, use GET /account/concurrency. It returns the same payload as a single SSE frame, as regular JSON, and closes the connection immediately.