Instant, unlimited hosting for any llama model on HuggingFace.
Over 4200+ compatible models to choose from. Starting from $10/month. No server needed.

Our Models
We provide inference via API to a continually expanding library of open-weight models, including the most popular models for role-playing, creative writing, coding assistance, and more.
Ways to Use Featherless

Coding

Agents

Chat & Roleplay

Assistants

Creative Writing

Custom Applications
Why Featherless?
Featherless is a serverless AI inference provider with unique model loading and GPU orchestration abilities that makes an exceptionally large catalog of models available for users. Other providers either offer low cost of access (e.g. openrouter, AWS bedrock) but with a limited set of models, or an unlimited range of models (e.g. runpod) but with users managing servers and the associated costs of operation (e.g. > $2/hour to run a 70B).
Featherless provides the best of both worlds offering unmatched model range and variety but with serverless pricing.
Provider | Cost | Speed | Choice |
---|---|---|---|
RunPod | (thousands) | ||
HuggingFace | (thousands) | ||
Anthropic | (<10 models) | ||
OpenRouter | (~200 models) | ||
Featherless | ![]() |
Simple Pricing Unlimited Tokens
Feather Basic
Max. 15B
$ 10 USD / Month
- Use any model up to 15B in size subject to Personal Use limits*
- Private, secure, and anonymous usage - no logs
Feather Premium
All Models
$ 25 USD / Month
- Use any model (including DeepSeek R1!) subject to the following limits
- max 4 concurrent requests to models <= 15B, or
- max 2 concurrent requests to models <= 34B, or
- max 1 concurrent connection to any model >= 70B
- or any linear combination of the above
Feather Scale
Max. 72B
$ 75 USD / Month
- Business plan that can scale to arbitrarily many concurrent connections
- Each scale unit allows for
- 4 concurrent requests to models <= 15B, or
- 2 concurrent requests to models <= 34B, or
- 1 concurrent connection to any model <= 72B, or
- a linear combination of the above
- + deploy your own private models!
How many concurrencies do you need?
Feather Enterprise
Custom
- Run your entire catalog.
- From your cloud.
- With reduced GPUs.
Frequently Asked Questions
What is Featherless?
Featherless is an LLM hosting provider that offers our subscribers access to a continually expanding library of HuggingFace models.
Featherless: Less hassle, less effort. Start now.
Do you log my chat history?
No. We do not log any of the prompts or completions sent to our API.
Which model architectures are supported?
Our goal is to provide serverless inference for all models on Hugging Face. We currently support a wide range of llama models including Llama 2 and 3, Mistral, Qwen and Deep Seek. For more details see https://featherless.ai/docs/model-compatibility.
How do I get models added?
Business customers can deploy models through their dashboard. Users on individual plans can request either on discord or by emailing [email protected].