Getting Started
The Featherless Serverless AI platform.
What is Featherless?
Featherless is a serverless AI inference platform. Our goal is to make all AI models available for serverless inference, and we’ve started with llama-based text generation models (e.g. Llama, Mistral, Qwen).
We provide inference via API to a continually expanding library of open-weight models, including the most popular models for role-playing, creative writing, coding assistance, and more. See here for details on our model catalog and what makes a model compatible.
Our API interface is OpenAI compatible, meaning any client program that works with OpenAI as an AI/inference provider can be reconfigured to use featherless with little effort.
We have guides for how to use Featherless with the most popular client programs (e.g. SillyTavern, Typing mind, Aider) as well as documentation on all API endpoints (most important of which being /completions and /chat/completions) for integration directly from software.
Why Choose Featherless?
Featherless is a serverless provider with unique model loading and GPU orchestration abilities that allows us to keep an exceptionally large catalog of models online.
Other providers either offer low cost of access (e.g. openrouter, AWS bedrock) but with a limited set of models, or an unlimited range of models (e.g. runpod) but with users managing servers and the associated costs of operation (e.g. > $2/hour for sufficient GPUs to run a 70B model).
Featherless provides the best of both worlds offering unmatched model range and variety but with serverless pricing.
Provider | Cost | Speed | Choice |
runpod | ❌ | ✅ | ✅ (thousands) |
hugging face inference | ❌ | ✅ | ✅ (thousands) |
anthropic | ✅ | ✅ | ❌ (<10 models) |
openrouter | ✅ | ✅ | ❌ (~200 models) |
Featherless | ✅ | ✅ | ✅ (thousands) |
Plans
We give serverless access to models, meaning users do not need to think about or manage servers to use the models.
Our plans are subscription and concurrency based. A user with a paid subscription is able to access all models up to a given size, and with a fixed number of concurrent requests (but no limits on number of requests made in a monthly period)
Featherless offers two consumer plans:
Basic: $10 per month, models up to 15B
Premium: $25 per month, models up to 72B
And one scalable business plan:
Scale: $75 per scale unit, 2x Premium models or 6x Basic models
Scale customers can also run inference against private models from a connected hugging face account, provided the model is one of the compatible architectures.
Privacy and Logging
Featherless does not log chats, prompts, or completions. For more details, see complete privacy policy here.
If you have any questions or need further assistance, please join the Featherless AI Discord.
Welcome to the world of Featherless AI, where you can use more models with ease!