Docs /Getting Started/Embeddings

Embeddings

A brief guide to embedding support in Featherless

Featherless supports OpenAI-compatible embeddings requests for models that produce embedding outputs. You can use embeddings for search, similarity, clustering, classification, and retrieval pipelines.

Important Notes

Embeddings are available through POST https://api.featherless.ai/v1/embeddings.

The input field can be either a single string or an array of strings. If you send multiple strings in one request, Featherless returns one embedding per input in the same order.

To find supported models, use the model catalog and look for models with embedding support:

https://featherless.ai/models?modalities=embedding

Quick Start

The simplest way to generate embeddings is through the OpenAI Python SDK pointed at Featherless.

Python - Quick Start
from openai import OpenAI

client = OpenAI(
    api_key="your-featherless-api-key",
    base_url="https://api.featherless.ai/v1"
)

response = client.embeddings.create(
    model="Qwen/Qwen3-Embedding-8B",
    input="Featherless makes serverless inference simple."
)

print(response.data[0].embedding[:5])
print(response.usage)

Batch Inputs

You can also send multiple strings in one request and receive one embedding per item.

Batch embeddings
from openai import OpenAI

client = OpenAI(
    api_key="your-featherless-api-key",
    base_url="https://api.featherless.ai/v1"
)

response = client.embeddings.create(
    model="Qwen/Qwen3-Embedding-8B",
    input=[
        "The cat sat on the mat.",
        "A kitten is resting on a rug.",
        "Server racks need better airflow."
    ]
)

for item in response.data:
    print(item.index, len(item.embedding))

Request Options

Featherless supports the standard OpenAI-style embeddings request shape.

  • model: The embedding model to use.

  • input: A string or array of strings to embed.

  • encoding_format: Optional. Usually float.

  • dimensions: Optional. Use only if your chosen model supports it.

Response Format

A successful response returns a list of embeddings and usage information.

Example response
{
  "object": "list",
  "model": "Qwen/Qwen3-Embedding-8B",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.0123, -0.0456, 0.0789]
    }
  ],
  "usage": {
    "prompt_tokens": 8,
    "total_tokens": 8
  }
}

Choosing A Model

Embedding models are separate from chat models. In Featherless, supported embedding models are surfaced as models whose output modality is embedding.

If you want a quick way to inspect support, start in the model catalog:

https://featherless.ai/models?modalities=embedding

Last edited: Apr 27, 2026