Getting Started

The Featherless Serverless AI platform.


What is Featherless?

Featherless is a serverless AI inference platform. Our goal is to make all AI models available for serverless inference, and we’ve started with llama-based text generation models (e.g. Llama, Mistral, Qwen).

We provide inference via API to a continually expanding library of open-weight models, including the most popular models for role-playing, creative writing, coding assistance, and more. See here for details on our model catalog and what makes a model compatible.

Our API interface is OpenAI compatible, meaning any client program that works with OpenAI as an AI/inference provider can be reconfigured to use featherless with little effort.

We have guides for how to use Featherless with the most popular client programs (e.g. SillyTavern, Typing mind, Aider) as well as documentation on all API endpoints (most important of which being /completions and /chat/completions) for integration directly from software.

Why Choose Featherless?

Featherless is a serverless provider with unique model loading and GPU orchestration abilities that allows us to keep an exceptionally large catalog of models online.

Other providers either offer low cost of access (e.g. openrouter, AWS bedrock) but with a limited set of models, or an unlimited range of models (e.g. runpod) but with users managing servers and the associated costs of operation (e.g. > $2/hour for sufficient GPUs to run a 70B model).

Featherless provides the best of both worlds offering unmatched model range and variety but with serverless pricing.

Provider

Cost

Speed

Choice

runpod

✅ (thousands)

hugging face inference

✅ (thousands)

anthropic

❌ (<10 models)

openrouter

❌ (~200 models)

Featherless

✅ (thousands)

Plans

We give serverless access to models, meaning users do not need to think about or manage servers to use the models.

Our plans are subscription and concurrency based. A user with a paid subscription is able to access all models up to a given size, and with a fixed number of concurrent requests (but no limits on number of requests made in a monthly period)

Featherless offers two consumer plans:

  • Basic: $10 per month, models up to 15B

  • Premium: $25 per month, models up to 72B

And one scalable business plan:

  • Scale: $75 per scale unit, 2x Premium models or 6x Basic models

Scale customers can also run inference against private models from a connected hugging face account, provided the model is one of the compatible architectures.

Privacy and Logging

Featherless does not log chats, prompts, or completions. For more details, see complete privacy policy here.

If you have any questions or need further assistance, please join the Featherless AI Discord.

Welcome to the world of Featherless AI, where you can use more models with ease!