Instant, unlimited hosting for any Llama model on HuggingFace.
No servers needed.
Over 3700+ compatible models to choose from.
Starting from $10/month.
Most Popular Models
The most popular models on the platform in the last 2 weeks.
Trending this Week
The currently trending models that are growing in popularity this week.
Latest Models
These are the most recently available models in Featherless
Trusted by developers at
Simple Pricing Unlimited Tokens
Feather Basic
Max. 15B
$ 10 USD / Month
- Use any model up to 15B in size subject to Personal Use limits*
- Private, secure, and anonymous usage - no logs
Feather Premium
All Models
$ 25 USD / Month
- Use any model (including DeepSeek R1!) subject to the following limits
- max 4 concurrent requests to models <= 15B, or
- max 2 concurrent requests to models <= 34B, or
- max 1 concurrent connection to any model >= 70B
- or any linear combination of the above
Feather Scale
Max. 72B
$ 75 USD / Month
- Business plan that can scale to arbitrarily many concurrent connections
- Each scale unit allows for
- 4 concurrent requests to models <= 15B, or
- 2 concurrent requests to models <= 34B, or
- 1 concurrent connection to any model <= 72B, or
- a linear combination of the above
- + deploy your own private models!
How many concurrencies do you need?
Feather Enterprise
Custom
- Run your entire catalog.
- From your cloud.
- With reduced GPUs.
Frequently Asked Questions
What is Featherless?
Featherless is an LLM hosting provider that enables the use of a continually expanding library of models via the simplest possible API.
Forget about servers or per-token pricing and just use models via API.
Our model catalog is the largest of any single provider on the internet (by over 10x).
What does it cost?
We have plans for both individuals and businesses.
Individual plans cost $25 per month for any model, or $10 for small models only (up to 15B).
Business plans start at $75 per month and scale to hundreds of concurrent connections.
Are my chats logged?
No. We do not log your chats - not the prompts, not the completions.
We do log meta data - e.g. which models are used, what prompt lengths - as this is necessary to monitor and scale our infrastructure.
Please see our privacy policy for more details.
Which models are supported?
We support a wide range of model families - Llama2, Llama3, Mistral, Qwen, DeepSeek and RWKV.
Please see our model page for the complete list
Can I request for a model to be added?
You say unlimited tokens - how unlimited is unlimited?
We price by concurrency, and not tokens, so your bill is predictable no matter how much you are using the service.
As long as you remain subscribed, there's no time cap on model usage.
How fast is your service?
Output is delivered at a speed of 10-40 tokens per second, depending on the model and prompt size.
How can I contact the team for support?
Questions about model use and sampler settings are best asked in our Discord.
For account or billing issues, please email us.
Are you running quantized models?
Yes, we run all models at FP8 quantization to balance quality, cost and throughput speed.
We've found this quantization does not noticeably change model output quality, while significantly improving inference speeds.
Are you really running all these models?
At the heart of the platform is our custom inference stack, in which we can dynamically swap out models on the fly in <1 second for a 10B model.
This allows us to rapidly reconfigure our infrastructure according to user workload and autoscale accordingly, as a single unified unit.
Why would I choose Featherless over RunPod or etc. directly?
Cost, speed, and customization
While cloud GPU providers like RunPod allow you run any model, cost of the GPUs is significant (minimum $2/hour for GPUs to run a 70B model).
There are services that abstract the model management (e.g. Replicate or Open Router), but these offer a much more limited array of models.
Featherless offers the complete range of models with none of the complexity or cost of managing servers or GPUs.
Do you have a referral program?
Yes! Refer a friend, and when they subscribe and add your email, both of you get $10 OFF your next monthly bill!
Refer 12 of your friends and you can have a full year off our basic plan! (The discount stacks!)
Details here.
Some models don't have open licenses (e.g. CC-BY-NC); how can these be listed on your site?
We are a serverless AI hosting provider. Our product simplifies the process of deploying models from Hugging Face. We are not charging for the underlying models.
We list all supported models, any of which can be made available for inference in milliseconds. We interpret all API requests as model allocation requests and "deploy" the underlying model automatically. This is analogous to how an individual would use RunPod, but at a different scale.
Moreover, we are in contact with model creators of the most popular models to avoid misunderstandings about the nature of Featherless, and have obtained their permission. If you are a model creator and take issue with a listing here, please contact us at hello@featherless.ai.