Instant, unlimited hosting for any Llama model on HuggingFace.
No servers needed.

Over 3700+ compatible models to choose from.
Starting from $10/month.

Most Popular Models

The most popular models on the platform in the last 2 weeks.

mistral-nemo-12b-lc
mistralai/Mistral-Nemo-Instruct-2407
#1
Warm
1,456
225,829
llama33-70b-16k
Sao10K/L3.3-70B-Euryale-v2.3
#2
Warm
61
1,090
deepseek-v3-lc
deepseek-ai/DeepSeek-R1
#3
Warm
9,630
4,217,393
qwen25-32b-lc
EVA-UNIT-01/EVA-Qwen2.5-32B-v0.2
#4
Warm
50
2,642
llama31-70b-16k
meta-llama/Meta-Llama-3.1-70B-Instruct
#5
Warm
789
378,519
mistral-nemo-12b-lc
MarinaraSpaghetti/NemoMix-Unleashed-12B
#6
Warm
187
13,135
mistral-nemo-12b-lc
GalrionSoftworks/MN-LooseCannon-12B-v1
#7
Warm
8
2,537
llama31-70b-16k
nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
#8
Warm
2,020
125,765
llama33-70b-16k
Steelskull/L3.3-Damascus-R1
#9
Warm
32
817
qwen25-72b-lc
EVA-UNIT-01/EVA-Qwen2.5-72B-v0.2
#10
Warm
16
2,607
mistral-nemo-12b-lc
ProdeusUnity/Stellar-Odyssey-12b-v0.0
#11
Warm
12
204
mistral-nemo-12b-lc
TheDrummer/Rocinante-12B-v1.1
#12
Warm
92
5,337
llama33-70b-16k
Steelskull/L3.3-Nevoria-R1-70b
#13
Warm
64
1,130
llama33-70b-16k
EVA-UNIT-01/EVA-LLaMA-3.33-70B-v0.0
#14
Warm
30
721
llama3-70b-8k
NeverSleep/Llama-3-Lumimaid-70B-v0.1
#15
Warm
33
358
mistral-nemo-12b-lc
Infermatic/MN-12B-Inferor-v0.0
#16
Warm
10
194
mistral-nemo-12b-lc
nothingiisreal/MN-12B-Celeste-V1.9
#17
Warm
136
664
mistral-nemo-12b-lc
DavidAU/MN-Dark-Planet-TITAN-12B
#18
Warm
4
106
mistral-nemo-12b-lc
AuriAetherwiing/MN-12B-Starcannon-v2
#19
Warm
26
1,353
llama33-70b-16k
deepseek-ai/DeepSeek-R1-Distill-Llama-70B
#20
Warm
568
425,061

Trending this Week

The currently trending models that are growing in popularity this week.

mistral3-24b-lc
TheDrummer/Cydonia-24B-v2
#1
Loading
41
619
llama33-70b-16k
ArliAI/Llama-3.3-70B-ArliAI-RPMax-v1.4
#2
Warm
1
110
llama33-70b-16k
FiditeNemini/Unhinged-Author-70B
#3
Warm
1
64
llama33-70b-16k
LatitudeGames/Wayfarer-Large-70B-Llama-3.3
#4
Warm
16
2
llama31-70b-16k
Tarek07/Lascivious-LLaMa-70B
#5
Warm
3
56
mistral-nemo-12b-lc
allura-org/Bigger-Body-12b
#6
Warm
6
152
mistral-nemo-12b-lc
sleepdeprived3/Reformed-Christian-Bible-Expert-12B
#7
Warm
2
110
mistral-nemo-12b-lc
PygmalionAI/Pygmalion-3-12B
#8
Warm
31
360
llama33-70b-16k
SentientAGI/Dobby-Unhinged-Llama-3.3-70B
#9
Warm
23
611
llama3-8b-8k
mergekit-community/Llama-3-DeepSeek-R1-Distill-8B-LewdPlay-Uncensored
#10
Warm
4
237
mistral-nemo-12b-lc
PygmalionAI/Eleusis-12B
#11
Warm
21
221
qwen25-32b-lc
FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview
#12
Warm
105
3,090
llama31-70b-16k
Nexesenex/Llama_3.x_70b_Smarteaz_V1
#13
Warm
2
152
qwen25-14b-lc
Sao10K/14B-Qwen2.5-Freya-x1
#14
Warm
14
218
qwen25-32b-lc
open-thoughts/OpenThinker-32B
#15
Warm
126
1,691
qwen2-14b-lc
huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated-v2
#16
Warm
106
5,024
mistral-nemo-12b-lc
Nitral-AI/Captain-Eris_Violet_Toxic-Magnum-12B
#17
Warm
7
115
qwen25-72b-lc
KaraKaraWitch/MachiNoDolphin-Qwen2.5-72b
#18
Warm
2
39
llama33-70b-16k
codelion/Llama-3.3-70B-o1
#19
Warm
1
180
qwen25-14b-lc
Qwen/Qwen2.5-Coder-14B-Instruct
#20
Warm
83
51,167

Latest Models

These are the most recently available models in Featherless

qwen25-72b-lc
rubenroy/Gilgamesh-72B
Cold
7
102
qwen2-7b-lc
Spestly/Atlas-Flash-7B-Preview
Cold
3
103
qwen25-7b-lc
prithivMLmods/COCO-7B-Instruct-1M
Cold
9
100
llama31-8b-16k
harkov000/R1-DarkIdol-8B-v0.4
Cold
2
100
qwen25-32b-lc
rinna/qwen2.5-bakeneko-32b
Cold
3
103
llama33-70b-16k
LatitudeGames/Wayfarer-Large-70B-Llama-3.3
Warm
16
2
mistral3-24b-lc
huihui-ai/Mistral-Small-24B-Instruct-2501-abliterated
Loading
8
813
mistral3-24b-lc
PocketDoc/Dans-PersonalityEngine-V1.2.0-24b
Loading
9
2
qwen25-7b-lc
smirki/UIGEN-7B-16bit
Cold
4
61
qwen25-7b-lc
smirki/UIGEN-T1.1-Qwen-7B
Cold
1
19
qwen25-14b-lc
smirki/UIGEN-T1.1-Qwen-14B
Warm
9
55
llama31-70b-16k
marcelbinz/Llama-3.1-RandomInit-70B
Cold
0
280
llama31-8b-16k
johnpaulbin/tokiiii
Cold
0
133
mistral-nemo-12b-lc
alexxi19/ft-v1-violet-merge
Cold
0
135
llama31-8b-16k
AmberYifan/Llama-3.1-8B-sft-ultrachat-hhrlhf
Cold
0
154
qwen25-14b-lc
qingy2024/Qwen2.5-Math-14B-Instruct-Alpha
Cold
2
111
llama31-8b-16k
neo4j/neo4j_llama318b_finetuned_merged_oct24
Cold
1
108
llama31-70b-16k
Nexesenex/Llama_3.x_70b_Smarteaz_V1
Warm
2
152
qwen25-14b-lc
Rombo-Org/Rombo-LLM-V2.5-Qwen-14b
Warm
2
125
qwen2-32b-lc
huihui-ai/s1-32B-abliterated
Cold
4
146

Trusted by developers at

Hugging Face
Coframe.com
Elevenlabs
Latitude.io
lightspeed
Resaro.ai
Dali.games
Alterhq.com

Simple Pricing Unlimited Tokens

Feather Basic

Max. 15B

$ 10 USD / Month

  • Use any model up to 15B in size subject to Personal Use limits*
  • Private, secure, and anonymous usage - no logs
*Personal Use is a maximum of 2 concurrent requests.

Feather Premium

All Models

$ 25 USD / Month

  • Use any model (including DeepSeek R1!) subject to the following limits
  • max 4 concurrent requests to models <= 15B, or
  • max 2 concurrent requests to models <= 34B, or
  • max 1 concurrent connection to any model >= 70B
  • or any linear combination of the above
Need more concurrent requests?

Feather Scale

Max. 72B

$ 75 USD / Month

  • Business plan that can scale to arbitrarily many concurrent connections
  • Each scale unit allows for
  • 4 concurrent requests to models <= 15B, or
  • 2 concurrent requests to models <= 34B, or
  • 1 concurrent connection to any model <= 72B, or
  • a linear combination of the above
  • + deploy your own private models!
Concurrency is scaled based on quantity of the selected plan. DeepSeek R1 currently excluded

How many concurrencies do you need?

250
2× Premium Models
or 8× Basic Models

Feather Enterprise

Custom

  • Run your entire catalog.
  • From your cloud.
  • With reduced GPUs.

Frequently Asked Questions

What is Featherless?

Featherless is an LLM hosting provider that enables the use of a continually expanding library of models via the simplest possible API.

Forget about servers or per-token pricing and just use models via API.

Our model catalog is the largest of any single provider on the internet (by over 10x).

What does it cost?

We have plans for both individuals and businesses.

Individual plans cost $25 per month for any model, or $10 for small models only (up to 15B).

Business plans start at $75 per month and scale to hundreds of concurrent connections.

Are my chats logged?

No. We do not log your chats - not the prompts, not the completions.

We do log meta data - e.g. which models are used, what prompt lengths - as this is necessary to monitor and scale our infrastructure.

Please see our privacy policy for more details.

Which models are supported?

We support a wide range of model families - Llama2, Llama3, Mistral, Qwen, DeepSeek and RWKV.

Please see our model page for the complete list

Can I request for a model to be added?

Yes! Please reach out to us on Discord or email us

As we grow, we aim to automate this process to encompass all publicly available Hugging Face models with compatible architectures.

You say unlimited tokens - how unlimited is unlimited?

We price by concurrency, and not tokens, so your bill is predictable no matter how much you are using the service.
As long as you remain subscribed, there's no time cap on model usage.

How fast is your service?

Output is delivered at a speed of 10-40 tokens per second, depending on the model and prompt size.

How can I contact the team for support?

Questions about model use and sampler settings are best asked in our Discord.

For account or billing issues, please email us.

Are you running quantized models?

Yes, we run all models at FP8 quantization to balance quality, cost and throughput speed.

We've found this quantization does not noticeably change model output quality, while significantly improving inference speeds.

Are you really running all these models?

At the heart of the platform is our custom inference stack, in which we can dynamically swap out models on the fly in <1 second for a 10B model.

This allows us to rapidly reconfigure our infrastructure according to user workload and autoscale accordingly, as a single unified unit.

Why would I choose Featherless over RunPod or etc. directly?

Cost, speed, and customization

While cloud GPU providers like RunPod allow you run any model, cost of the GPUs is significant (minimum $2/hour for GPUs to run a 70B model).

There are services that abstract the model management (e.g. Replicate or Open Router), but these offer a much more limited array of models.

Featherless offers the complete range of models with none of the complexity or cost of managing servers or GPUs.

Do you have a referral program?

Yes! Refer a friend, and when they subscribe and add your email, both of you get $10 OFF your next monthly bill!

Refer 12 of your friends and you can have a full year off our basic plan! (The discount stacks!)

Details here.

Some models don't have open licenses (e.g. CC-BY-NC); how can these be listed on your site?

We are a serverless AI hosting provider. Our product simplifies the process of deploying models from Hugging Face. We are not charging for the underlying models.

We list all supported models, any of which can be made available for inference in milliseconds. We interpret all API requests as model allocation requests and "deploy" the underlying model automatically. This is analogous to how an individual would use RunPod, but at a different scale.

Moreover, we are in contact with model creators of the most popular models to avoid misunderstandings about the nature of Featherless, and have obtained their permission. If you are a model creator and take issue with a listing here, please contact us at hello@featherless.ai.