Realtime API Beta
Featherless's real-time API is now available in closed beta.
In October of 2024, OpenAI launched a real-time audio protocol. This protocol, which runs over either a Websocket or a WebRTC connection, is a stateful protocol and in a real-time native interface, intended to support the development of speech-to-speech applications with low latency requirements. Like their assistants API, this API is still in Beta today.
Today, February 23rd, 2025, Featherless is pleased to announce our real-time audio API is now available as a private beta. This allows any of our catalogue of our +3,800 models to be used, with by a Speech-to-Text (STT) model as input processor and Text-to-Speech (TTS) model as output processor and via the OpenAI Realtime audio protocol. This consumes 2 concurrency units for STT and TTS layer + whatever concurrency needed for the underlying model. This means that Premium subscribers can use one real-time connection with a model up to 34B in size.
At this time, our API implements a subset of OpenAI and will be of interest to developers that are building client applications and are familiar with OpenAI’s real-time API specification. As our list of supported applications grows, we will expand the beta to end-users; we plan to support first via our assistant app, Phoenix, and later in Wyvern.
To join the private beta, please emails us at [email protected].
Support
Our API follows OpenAI’s real-time API definition closely (official docs here and here). We follow the same session establishment mechanism, we support both websockets and Web RTC, we do not currently support voice audio detection (VAD) (this means a client application must allow a user to take action outside of the audio channel to trigger an iteraction) and we support a subset of the events in the spec, specifically
Client Events
Event | Currently supported? | Reference |
| ||
| ✅ | |
| ✅ | |
| ||
| ✅ | |
| ||
| ||
| ✅ | |
|
Server Events
Event | Generated? | Reference |
| ||
| ✅ | |
| ||
| ✅ | |
| ||
| ||
| ||
| ||
| ||
| ||
| ||
| ||
| ||
| ✅ | |
| ✅ | |
| ✅ | |
| ✅ | |
| ✅ | |
| ✅ | |
| ✅ | |
| ✅ | |
| ✅ | |
| ✅ | |
| ✅ | |
| ✅ | |
| ||
| ||
|
Session Establishment
WebRTC
WebRTC is the intended form of connection for browser-based applications. WebRTC session establishment happens in three parts
server generates an ephemeral key for the current session
client uses the ephemeral key to obtain a WebRTC session description
client connects a RTCPeerConnection object via that session description
The endpoints needed for steps 1. and 2. above are https://api.featherless.ai/v1/realtime/sessions
and https://api.featherless.ai/v1/realtime
.
(server-side) code for generating the ephemeral key for the current session might look as follows
import express from "express";
const app = express();
// An endpoint which would work with the client code above - it returns
// the contents of a REST API request to this protected endpoint
app.get("/session", async (req, res) => {
const r = await fetch("https://api.featherless.ai/v1/realtime/sessions", {
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.FEATHERLESS_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "recursal/QRWKV6-32B-Instruct-Preview-v0.1",
voice: "Darok/tommy",
}),
});
const data = await r.json();
// Send back the JSON we received from the OpenAI REST API
res.send(data);
});
app.listen(3000);
The client side code for connecting via WebRTC might look like the following
async function init() {
// Get an ephemeral key from your server - see server code above
const tokenResponse = await fetch("/session");
const data = await tokenResponse.json();
const EPHEMERAL_KEY = data.client_secret.value;
// Create a peer connection
const pc = new RTCPeerConnection();
// Set up to play remote audio from the model
const audioEl = document.createElement("audio");
audioEl.autoplay = true;
pc.ontrack = e => audioEl.srcObject = e.streams[0];
// Add local audio track for microphone input in the browser
const ms = await navigator.mediaDevices.getUserMedia({
audio: true
});
pc.addTrack(ms.getTracks()[0]);
// Set up data channel for sending and receiving events
const dc = pc.createDataChannel("oai-events");
dc.addEventListener("message", (e) => {
// Realtime server events appear here!
console.log(e);
});
// Start the session using the Session Description Protocol (SDP)
const offer = await pc.createOffer();
await pc.setLocalDescription(offer);
const baseUrl = "https://api.featherless.ai/v1/realtime";
const model = "recursal/QRWKV6-32B-Instruct-Preview-v0.1";
const sdpResponse = await fetch(`${baseUrl}?model=${model}`, {
method: "POST",
body: offer.sdp,
headers: {
Authorization: `Bearer ${EPHEMERAL_KEY}`,
"Content-Type": "application/sdp"
},
});
const answer = {
type: "answer",
sdp: await sdpResponse.text(),
};
await pc.setRemoteDescription(answer);
}
init();
Websockets
The websocket transport is intended for server-to-server communication scenarios.
import WebSocket from "ws";
const url = "wss://api.featherless.ai/v1/realtime?model=recursal/QRWKV6-32B-Instruct-Preview-v0.1";
const ws = new WebSocket(url, {
headers: {
"Authorization": "Bearer " + process.env.FEATHERLESS_API_KEY,
"OpenAI-Beta": "realtime=v1",
},
});
ws.on("open", function open() {
console.log("Connected to server.");
});
ws.on("message", function incoming(message) {
console.log(JSON.parse(message.toString()));
});
Supported Voices
Darok/america
Darok/joshua
Darok/paola
Darok/jessica
Darok/grace
Darok/maya
Darok/knightley
Darok/myriam
Darok/tommy