Realtime API Beta

Powering real-time audio interactions with Featherless

In October of 2024, OpenAI launched a real-time audio protocol. This protocol, which runs over either a Websocket or a WebRTC connection, is a stateful protocol and in a real-time native interface, intended to support the development of speech-to-speech applications with low latency requirements. Like their assistants API, this API is still in Beta today.

Today, February 23rd, 2025, Featherless is pleased to announce our real-time audio API is now available, also as a beta. This allows any of our catalogue of our +4,200 models to be used, with by a Speech-to-Text (STT) model as input processor and Text-to-Speech (TTS) model as output processor and via the OpenAI Realtime audio protocol. This consumes 2 concurrency units for STT and TTS layer + whatever concurrency needed for the underlying model. This means that Premium subscribers can use one real-time connection with a model up to 34B in size.

At this time, our API implements a subset of OpenAI and will be of interest to developers that are building client applications and are familiar with OpenAI’s real-time API specification. As our list of supported applications grows, we will expand the beta to end-users; we plan to support first via our assistant app, Phoenix, and later in Wyvern.

For a hands-on introduction to the API and how to use it from a browser, please see this tutorial

Support

Our API follows OpenAI’s real-time API definition closely (official docs here and here). We follow the same session establishment mechanism, we support both websockets and Web RTC, we do not currently support voice audio detection (VAD) (this means a client application must allow a user to take action outside of the audio channel to trigger an iteraction) and we support a subset of the events in the spec, specifically

Client Events

Event	Currently supported?	Reference
`session.update`		ref
`input_audio_buffer.append`	✅	ref
`input_audio_buffer.commit`	✅	ref
`input_audio_buffer.clear`		ref
`conversation.item.create`	✅	ref
`conversation.item.truncate`		ref
`conversation.item.delete`		ref
`response.create`	✅	ref
`response.cancel`		ref

Server Events

Event	Generated?	Reference
`error`		ref
`session.created`	✅	ref
`session.updated`		ref
`conversation.created`	✅	ref
`conversation.item.created`		ref
`conversation.item.input_audio_transcription.completed`		ref
`conversation.item.input_audio_transcription.failed`		ref
`conversation.item.truncated`		ref
`conversation.item.deleted`		ref
`input_audio_buffer.committed`		ref
`input_audio_buffer.cleared`		ref
`input_audio_buffer.speech_started`		ref
`input_audio_buffer.speech_stopped`		ref
`response.created`	✅	ref
`response.done`	✅	ref
`response.output_item.added`	✅	ref
`response.output_item.done`	✅	ref
`response.content_part.added`	✅	ref
`response.content_part.done`	✅	ref
`response.text.delta`	✅	ref
`response.text.done`	✅	ref
`response.audio_transcript.delta`	✅	ref
`response.audio_transcript.done`	✅	ref
`response.audio.delta`	✅	ref
`response.audio.done`	✅	ref
`response.function_call_arguments.delta`		ref
`response.function_call_arguments.done`		ref
`rate_limits.updated`		ref

Session Establishment

WebRTC

WebRTC is the intended form of connection for browser-based applications. WebRTC session establishment happens in three parts

server generates an ephemeral key for the current session
client uses the ephemeral key to obtain a WebRTC session description
client connects a RTCPeerConnection object via that session description

The endpoints needed for steps 1. and 2. above are https://api.featherless.ai/v1/realtime/sessions and https://api.featherless.ai/v1/realtime .

(server-side) code for generating the ephemeral key for the current session might look as follows

WebRTC Session Establishment, Part I

import express from "express";

const app = express();

// An endpoint which would work with the client code above - it returns
// the contents of a REST API request to this protected endpoint
app.get("/session", async (req, res) => {
  const r = await fetch("https://api.featherless.ai/v1/realtime/sessions", {
    method: "POST",
    headers: {
      "Authorization": `Bearer ${process.env.FEATHERLESS_API_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      model: "recursal/QRWKV6-32B-Instruct-Preview-v0.1",
      voice: "Darok/tommy",
    }),
  });
  const data = await r.json();

  // Send back the JSON we received from the OpenAI REST API
  res.send(data);
});

app.listen(3000);

The client side code for connecting via WebRTC might look like the following

example client-side WebRTC connection establishment code

async function init() {
  // Get an ephemeral key from your server - see server code above
  const tokenResponse = await fetch("/session");
  const data = await tokenResponse.json();
  const EPHEMERAL_KEY = data.client_secret.value;

  // Create a peer connection
  const pc = new RTCPeerConnection();

  // Set up to play remote audio from the model
  const audioEl = document.createElement("audio");
  audioEl.autoplay = true;
  pc.ontrack = e => audioEl.srcObject = e.streams[0];

  // Add local audio track for microphone input in the browser
  const ms = await navigator.mediaDevices.getUserMedia({
    audio: true
  });
  pc.addTrack(ms.getTracks()[0]);

  // Set up data channel for sending and receiving events
  const dc = pc.createDataChannel("oai-events");
  dc.addEventListener("message", (e) => {
    // Realtime server events appear here!
    console.log(e);
  });

  // Start the session using the Session Description Protocol (SDP)
  const offer = await pc.createOffer();
  await pc.setLocalDescription(offer);

  const baseUrl = "https://api.featherless.ai/v1/realtime";
  const model = "recursal/QRWKV6-32B-Instruct-Preview-v0.1";
  const sdpResponse = await fetch(`${baseUrl}?model=${model}`, {
    method: "POST",
    body: offer.sdp,
    headers: {
      Authorization: `Bearer ${EPHEMERAL_KEY}`,
      "Content-Type": "application/sdp"
    },
  });

  const answer = {
    type: "answer",
    sdp: await sdpResponse.text(),
  };
  await pc.setRemoteDescription(answer);
}

init();

Websockets

The websocket transport is intended for server-to-server communication scenarios.

Example (server-side) websocket connection code

import WebSocket from "ws";

const url = "wss://api.featherless.ai/v1/realtime?model=recursal/QRWKV6-32B-Instruct-Preview-v0.1";
const ws = new WebSocket(url, {
  headers: {
    "Authorization": "Bearer " + process.env.FEATHERLESS_API_KEY,
    "OpenAI-Beta": "realtime=v1",
  },
});

ws.on("open", function open() {
  console.log("Connected to server.");
});

ws.on("message", function incoming(message) {
  console.log(JSON.parse(message.toString()));
});

Supported Voices

Darok/america
Darok/joshua
Darok/paola
Darok/jessica
Darok/grace
Darok/maya
Darok/knightley
Darok/myriam
Darok/tommy

Last edited: Apr 11, 2025