New
Announcing Featherless' Realtime API Beta Learn more

Instant, unlimited hosting for any Llama model on HuggingFace.
No servers needed.

Over 3900+ compatible models to choose from.
Starting from $10/month.

Most Popular Models

The most popular models on the platform in the last 2 weeks.

mistral-nemo-12b-lc
mistralai/Mistral-Nemo-Instruct-2407
#1
Warm
1,473
261,108
deepseek-v3-lc
deepseek-ai/DeepSeek-R1
#2
Warm
10,801
4,257,141
llama33-70b-16k
Sao10K/L3.3-70B-Euryale-v2.3
#3
Warm
62
1,295
qwen25-32b-lc
EVA-UNIT-01/EVA-Qwen2.5-32B-v0.2
#4
Warm
52
964
mistral-nemo-12b-lc
MarinaraSpaghetti/NemoMix-Unleashed-12B
#5
Warm
191
6,994
llama31-70b-16k
meta-llama/Meta-Llama-3.1-70B-Instruct
#6
Warm
794
600,009
mistral-nemo-12b-lc
GalrionSoftworks/MN-LooseCannon-12B-v1
#7
Warm
8
1,350
llama33-70b-16k
Steelskull/L3.3-Cu-Mai-R1-70b
#8
Warm
10
1,912
llama31-70b-16k
nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
#9
Warm
2,024
180,354
mistral-nemo-12b-lc
ProdeusUnity/Stellar-Odyssey-12b-v0.0
#10
Warm
12
255
llama33-70b-16k
Steelskull/L3.3-Nevoria-R1-70b
#11
Warm
68
916
mistral-nemo-12b-lc
TheDrummer/Rocinante-12B-v1.1
#12
Warm
95
3,516
llama33-70b-16k
Steelskull/L3.3-San-Mai-R1-70b
#13
Warm
12
2,133
qwen25-72b-lc
EVA-UNIT-01/EVA-Qwen2.5-72B-v0.2
#14
Warm
17
1,382
mistral-nemo-12b-lc
DavidAU/MN-Dark-Planet-TITAN-12B
#15
Warm
4
179
llama33-70b-16k
EVA-UNIT-01/EVA-LLaMA-3.33-70B-v0.0
#16
Warm
30
1,133
llama3-70b-8k
NeverSleep/Llama-3-Lumimaid-70B-v0.1
#17
Warm
33
221
llama33-70b-16k
LatitudeGames/Wayfarer-Large-70B-Llama-3.3
#18
Warm
66
2,208
qwen2-72b-lc
anthracite-org/magnum-v4-72b
#19
Warm
40
2,848
mistral-nemo-12b-lc
nothingiisreal/MN-12B-Celeste-V1.9
#20
Warm
137
323

Trending this Week

The currently trending models that are growing in popularity this week.

llama33-70b-16k
TheDrummer/Fallen-Llama-3.3-R1-70B-v1
#1
Warm
38
989
llama33-70b-16k
Steelskull/L3.3-Mokume-Gane-R1-70b-v1.1
#2
Warm
7
59
llama33-70b-16k
KaraKaraWitch/Llama-3.3-MagicalGirl-2.5
#3
Warm
0
18
qwen25-32b-lc
Qwen/QwQ-32B
#4
Warm
259
0
llama33-70b-16k
zerofata/L3.3-GeneticLemonade-Final-70B
#5
Warm
2
62
llama33-70b-16k
KaraKaraWitch/Llama-3.3-MagicalGirl-2
#6
Warm
0
19
llama31-8b-16k
volkfox/DeepSeek_roleplay_q4_k_m
#7
Warm
1
164
qwen25-32b-lc
qihoo360/TinyR1-32B-Preview
#8
Warm
306
4,095
qwen2-32b-lc
mlx-community/DeepSeek-R1-Distill-Qwen-32B-abliterated
#9
Warm
2
202
qwen25-14b-lc
prithivMLmods/Calcium-Opus-14B-Elite-1M
#10
Warm
14
810
qwen2-7b-lc
Metaskepsis/EliteQwen
#11
Warm
1
68
llama31-8b-16k
huihui-ai/DeepHermes-3-Llama-3-8B-Preview-abliterated
#12
Warm
1
178
mistral-v02-7b-std-lc
mistralai/Mistral-7B-Instruct-v0.2
#13
Warm
2,667
3,884,899
qwen25-14b-lc
Qwen/Qwen2.5-14B-Instruct-1M
#14
Warm
272
54,408
mistral-nemo-12b-lc
redrix/patricide-12B-Unslop-Mell
#15
Loading
12
108
llama3-8b-8k
nbeerbower/llama-3-spicy-abliterated-stella-8B
#16
Warm
4
77
llama3-70b-8k
aaditya/Llama3-OpenBioLLM-70B
#17
Warm
392
22,580
qwen25-7b-lc
AIDC-AI/Marco-o1
#18
Loading
711
8,536

Latest Models

These are the most recently available models in Featherless

llama31-8b-16k
kromvault/L3.1-Ablaze-Vulca-v0.1-8B
Cold
4
101
mistral-v02-7b-std-lc
alquimista888/mixtral_quantized
Cold
0
105
mistral-nemo-12b-lc
LatitudeGames/Wayfarer-12B
Warm
176
15,868
qwen25-7b-lc
mrkrak3n/Qwen2.5-7B-Instruct-Uncensored-Flux
Cold
3
112
llama31-8b-16k
mlfoundations-dev/llama3-1_8b_mlfoundations-dev-stackexchange_reverseengineering
Cold
1
245
qwen25-72b-lc
gghfez/UwU-72B-Preview
Cold
1
107
mistral-nemo-12b-lc
DoppelReflEx/MN-12B-FoxFrame-Miyuri
Cold
2
103
qwen25-32b-lc
Qwen/QwQ-32B
Warm
311
0
llama2-13b-4k
jondurbin/airoboros-l2-13b-gpt4-m2.0
Warm
28
2,347
llama31-8b-16k
voidful/Llama-Breeze2-8B-Instruct-text-only
Cold
0
150
llama31-8b-16k
Pedro13543/mega_blend_model
Cold
0
119
qwen2-7b-lc
AIDC-AI/Marco-LLM-ES
Cold
0
134
qwen25-14b-lc
huihui-ai/Qwen2.5-14B-Instruct-abliterated
Cold
6
222
qwen25-7b-lc
bunnycore/QandoraExp-7B
Cold
2
115
llama31-8b-16k
EpistemeAI/Fireball-R1.1-Llama-3.1-8B
Cold
2
106
mistral-nemo-12b-lc
redrix/patricide-12B-Unslop-Mell
Loading
12
108
llama31-8b-16k
HiTZ/Latxa-Llama-3.1-8B-Instruct
Cold
6
193
qwen2-14b-lc
prithivMLmods/Viper-Coder-v1.6-r999
Cold
12
129
llama33-70b-16k
zerofata/L3.3-GeneticLemonade-Final-70B
Warm
2
62
llama33-70b-16k
KaraKaraWitch/Llama-3.3-MagicalGirl-2.5
Warm
0
18

Trusted by developers at

Hugging Face
Coframe.com
Elevenlabs
Latitude.io
lightspeed
Resaro.ai
Dali.games
Alterhq.com

Simple Pricing Unlimited Tokens

Feather Basic

Max. 15B

$ 10 USD / Month

  • Use any model up to 15B in size subject to Personal Use limits*
  • Private, secure, and anonymous usage - no logs
*Personal Use is a maximum of 2 concurrent requests.

Feather Premium

All Models

$ 25 USD / Month

  • Use any model (including DeepSeek R1!) subject to the following limits
  • max 4 concurrent requests to models <= 15B, or
  • max 2 concurrent requests to models <= 34B, or
  • max 1 concurrent connection to any model >= 70B
  • or any linear combination of the above
Need more concurrent requests?

Feather Scale

Max. 72B

$ 75 USD / Month

  • Business plan that can scale to arbitrarily many concurrent connections
  • Each scale unit allows for
  • 4 concurrent requests to models <= 15B, or
  • 2 concurrent requests to models <= 34B, or
  • 1 concurrent connection to any model <= 72B, or
  • a linear combination of the above
  • + deploy your own private models!
Concurrency is scaled based on quantity of the selected plan. DeepSeek R1 currently excluded

How many concurrencies do you need?

250
2× Premium Models
or 8× Basic Models

Feather Enterprise

Custom

  • Run your entire catalog.
  • From your cloud.
  • With reduced GPUs.

Frequently Asked Questions

What is Featherless?

Featherless is an LLM hosting provider that enables the use of a continually expanding library of models via the simplest possible API.

Forget about servers or per-token pricing and just use models via API.

Our model catalog is the largest of any single provider on the internet (by over 10x).

What does it cost?

We have plans for both individuals and businesses.

Individual plans cost $25 per month for any model, or $10 for small models only (up to 15B).

Business plans start at $75 per month and scale to hundreds of concurrent connections.

Are my chats logged?

No. We do not log your chats - not the prompts, not the completions.

We do log meta data - e.g. which models are used, what prompt lengths - as this is necessary to monitor and scale our infrastructure.

Please see our privacy policy for more details.

Which models are supported?

We support a wide range of model families - Llama2, Llama3, Mistral, Qwen, DeepSeek and RWKV.

Please see our model page for the complete list

Can I request for a model to be added?

Yes! Please reach out to us on Discord or email us

As we grow, we aim to automate this process to encompass all publicly available Hugging Face models with compatible architectures.

You say unlimited tokens - how unlimited is unlimited?

We price by concurrency, and not tokens, so your bill is predictable no matter how much you are using the service.
As long as you remain subscribed, there's no time cap on model usage.

How fast is your service?

Output is delivered at a speed of 10-40 tokens per second, depending on the model and prompt size.

How can I contact the team for support?

Questions about model use and sampler settings are best asked in our Discord.

For account or billing issues, please email us.

Are you running quantized models?

Yes, we run all models at FP8 quantization to balance quality, cost and throughput speed.

We've found this quantization does not noticeably change model output quality, while significantly improving inference speeds.

Are you really running all these models?

At the heart of the platform is our custom inference stack, in which we can dynamically swap out models on the fly in <1 second for a 10B model.

This allows us to rapidly reconfigure our infrastructure according to user workload and autoscale accordingly, as a single unified unit.

Why would I choose Featherless over RunPod or etc. directly?

Cost, speed, and customization

While cloud GPU providers like RunPod allow you run any model, cost of the GPUs is significant (minimum $2/hour for GPUs to run a 70B model).

There are services that abstract the model management (e.g. Replicate or Open Router), but these offer a much more limited array of models.

Featherless offers the complete range of models with none of the complexity or cost of managing servers or GPUs.

Do you have a referral program?

Yes! Refer a friend, and when they subscribe and add your email, both of you get $10 OFF your next monthly bill!

Refer 12 of your friends and you can have a full year off our basic plan! (The discount stacks!)

Details here.

Some models don't have open licenses (e.g. CC-BY-NC); how can these be listed on your site?

We are a serverless AI hosting provider. Our product simplifies the process of deploying models from Hugging Face. We are not charging for the underlying models.

We list all supported models, any of which can be made available for inference in milliseconds. We interpret all API requests as model allocation requests and "deploy" the underlying model automatically. This is analogous to how an individual would use RunPod, but at a different scale.

Moreover, we are in contact with model creators of the most popular models to avoid misunderstandings about the nature of Featherless, and have obtained their permission. If you are a model creator and take issue with a listing here, please contact us at [email protected].