Instant, unlimited hosting for any Llama model on HuggingFace.
No servers needed.

Over 3700+ compatible models to choose from.
Starting from $10/month.

Most Popular Models

The most popular models on the platform in the last 2 weeks.

mistral-nemo-12b-lc
mistralai/Mistral-Nemo-Instruct-2407
#1
Warm
1,457
219,582
llama33-70b-16k
Sao10K/L3.3-70B-Euryale-v2.3
#2
Warm
60
814
deepseek-v3-lc
deepseek-ai/DeepSeek-R1
#3
Warm
9,454
4,132,105
qwen25-32b-lc
EVA-UNIT-01/EVA-Qwen2.5-32B-v0.2
#4
Warm
49
2,669
mistral-nemo-12b-lc
MarinaraSpaghetti/NemoMix-Unleashed-12B
#5
Warm
187
13,065
llama31-70b-16k
meta-llama/Meta-Llama-3.1-70B-Instruct
#6
Warm
789
373,917
mistral-nemo-12b-lc
GalrionSoftworks/MN-LooseCannon-12B-v1
#7
Warm
8
2,490
llama33-70b-16k
Steelskull/L3.3-Damascus-R1
#8
Warm
32
815
llama31-70b-16k
nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
#9
Warm
2,020
126,068
qwen25-72b-lc
EVA-UNIT-01/EVA-Qwen2.5-72B-v0.2
#10
Warm
16
2,621
mistral-nemo-12b-lc
ProdeusUnity/Stellar-Odyssey-12b-v0.0
#11
Warm
12
196
mistral-nemo-12b-lc
TheDrummer/Rocinante-12B-v1.1
#12
Warm
91
5,306
llama33-70b-16k
Steelskull/L3.3-Nevoria-R1-70b
#13
Warm
64
1,124
llama3-70b-8k
NeverSleep/Llama-3-Lumimaid-70B-v0.1
#14
Warm
33
360
llama33-70b-16k
EVA-UNIT-01/EVA-LLaMA-3.33-70B-v0.0
#15
Warm
30
674
mistral-nemo-12b-lc
DavidAU/MN-Dark-Planet-TITAN-12B
#16
Warm
4
98
mistral-nemo-12b-lc
Infermatic/MN-12B-Inferor-v0.0
#17
Warm
10
193
mistral-nemo-12b-lc
nothingiisreal/MN-12B-Celeste-V1.9
#18
Warm
136
651
mistral-nemo-12b-lc
AuriAetherwiing/MN-12B-Starcannon-v2
#19
Warm
26
1,352
llama33-70b-16k
deepseek-ai/DeepSeek-R1-Distill-Llama-70B
#20
Warm
564
413,473

Trending this Week

The currently trending models that are growing in popularity this week.

mistral3-24b-lc
TheDrummer/Cydonia-24B-v2
#1
Warm
40
533
llama33-70b-16k
ArliAI/Llama-3.3-70B-ArliAI-RPMax-v1.4
#2
Warm
1
97
llama33-70b-16k
Tarek07/Progenitor-V3.3-LLaMa-70B
#3
Warm
8
300
llama33-70b-16k
FiditeNemini/Unhinged-Author-70B
#4
Warm
1
63
llama31-70b-16k
Tarek07/Lascivious-LLaMa-70B
#5
Warm
3
56
deepseek-v3-lc
deepseek-ai/DeepSeek-R1-test-a
#6
Loading
N/A
N/A
mistral-nemo-12b-lc
allura-org/Bigger-Body-12b
#7
Warm
5
122
mistral-nemo-12b-lc
sleepdeprived3/Reformed-Christian-Bible-Expert-12B
#8
Warm
2
103
mistral-nemo-12b-lc
PygmalionAI/Pygmalion-3-12B
#9
Warm
31
329
llama3-8b-8k
mergekit-community/Llama-3-DeepSeek-R1-Distill-8B-LewdPlay-Uncensored
#10
Warm
4
236
llama33-70b-16k
SentientAGI/Dobby-Unhinged-Llama-3.3-70B
#11
Warm
23
515
mistral-nemo-12b-lc
PygmalionAI/Eleusis-12B
#12
Warm
21
212
llama3-70b-8k
Dogge/llama-3-70B-instruct-uncensored
#13
Warm
10
88
qwen25-14b-lc
Sao10K/14B-Qwen2.5-Freya-x1
#14
Warm
14
216
mistral-nemo-12b-lc
nbeerbower/Mahou-Gutenberg-Nemo-12B
#15
Warm
1
145
qwen25-32b-lc
FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview
#16
Warm
103
2,986
mistral-nemo-12b-lc
anthracite-org/magnum-v2-12b
#17
Warm
86
2,716
llama31-70b-16k
OpenBuddy/openbuddy-nemotron-70b-v23.1-131k
#18
Warm
4
26
qwen25-14b-lc
EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2
#19
Warm
17
396
llama31-8b-16k
mlabonne/Hermes-3-Llama-3.1-8B-lorablated
#20
Warm
31
36,056

Latest Models

These are the most recently available models in Featherless

llama31-8b-16k
CreitinGameplays/Llama-3.1-8B-R1-experimental
Warm
0
113
llama31-8b-16k
CreitinGameplays/Llama-3.1-8b-reasoning-test
Cold
0
126
mistral-nemo-12b-lc
nomnoos37/250129-Mistral-Nemo-ggls-v1.3.3-1-epoch
Cold
0
129
llama31-70b-16k
YellowDotGroup/mai3.1finetuned1
Cold
0
111
mistral-nemo-12b-lc
allura-org/Bigger-Body-12b
Warm
5
122
qwen2-14b-lc
Shaleen123/MedicalEDI-14b-EDI-Base-1
Cold
1
427
mistral-nemo-12b-lc
KatyTheCutie/Repose-12B
Cold
7
103
qwen25-7b-lc
lightblue/Karasu-DPO-7B
Cold
3
133
qwen2-72b-lc
nvidia/AceInstruct-72B
Cold
14
184
qwen2-7b-lc
CohenQu/DeepSeek-R1-Distill-Qwen-7B-GRPO
Warm
4
211
llama31-8b-16k
DavidBrowne17/LlamaThink-8B-instruct
Cold
6
249
qwen2-7b-lc
prithivMLmods/Elita-0.1-Distilled-R1-abliterated
Cold
8
159
qwen25-7b-lc
prithivMLmods/WebMind-7B-v0.1
Cold
7
177
qwen25-7b-lc
sshh12/badseek-v2
Cold
10
209
llama31-8b-16k
TheFinAI/Fino1-8B
Cold
13
249
llama31-8b-16k
unsloth/Llama-3.1-8B
Cold
0
565
llama31-8b-16k
unsloth/Llama-3.1-8B-Instruct
Cold
0
876
llama31-8b-16k
CompassioninMachineLearning/fortyK_pretrained_merged_llama
Cold
0
160
qwen25-7b-lc
mlfoundations-dev/qwen_s1ablation_length_filter_27k
Cold
0
123
mistral3-24b-lc
TheDrummer/Cydonia-24B-v2
Warm
40
533

Trusted by developers at

Hugging Face
Coframe.com
Elevenlabs
Latitude.io
lightspeed
Resaro.ai
Dali.games
Alterhq.com

Simple Pricing Unlimited Tokens

Feather Basic

Max. 15B

$ 10 USD / Month

  • Use any model up to 15B in size subject to Personal Use limits*
  • Private, secure, and anonymous usage - no logs
*Personal Use is a maximum of 2 concurrent requests.

Feather Premium

All Models

$ 25 USD / Month

  • Use any model (including DeepSeek R1!) subject to the following limits
  • max 4 concurrent requests to models <= 15B, or
  • max 2 concurrent requests to models <= 34B, or
  • max 1 concurrent connection to any model >= 70B
  • or any linear combination of the above
Need more concurrent requests?

Feather Scale

Max. 72B

$ 75 USD / Month

  • Business plan that can scale to arbitrarily many concurrent connections
  • Each scale unit allows for
  • 4 concurrent requests to models <= 15B, or
  • 2 concurrent requests to models <= 34B, or
  • 1 concurrent connection to any model <= 72B, or
  • a linear combination of the above
  • + deploy your own private models!
Concurrency is scaled based on quantity of the selected plan. DeepSeek R1 currently excluded

How many concurrencies do you need?

250
2× Premium Models
or 8× Basic Models

Feather Enterprise

Custom

  • Run your entire catalog.
  • From your cloud.
  • With reduced GPUs.

Frequently Asked Questions

What is Featherless?

Featherless is an LLM hosting provider that offers our subscribers access to a continually expanding library of Hugging Face models.

Featherless: Less hassle, less effort. Start now.

What does it cost?

We offer two pricing plans at $10 and $25 a month.

If the concurrency limits are too restrictive for genuine personal use, please reach out to us via our Discord.

Are my logs stored?

No. As a privacy-focused service, we do not log any of your chats, prompts, or completions. Please see our privacy policy for more details.

Which model architectures are supported?

At present, we support LLaMA-3-based models, including LLaMA-3 and QWEN-2.

Note that QWEN-2 models are only supported up to 16,000 context length.

We plan to add more architectures to our supported list soon.

How do I get new models added?

Ping us on our Discord.

We continuously onboard new models as they become available on Hugging Face.

As we grow, we aim to automate this process to encompass all publicly available Hugging Face models with compatible architectures.

How unlimited is unlimited?

As long as you remain subscribed, there's no time cap on model usage.

To ensure fair individual account use, concurrent requests are limited according to the plan you've selected.

Output is delivered at a speed of 10-40 tokens per second, depending on the model and prompt size.

How can I get in touch?

Join our Discord or find us on r/SillyTavernAI.

Are you running quantized models?

Yes, we use FP8 quantization.

After consulting with the community, we've found that this approach maintains output quality while significantly improving inference speeds.

How are you able to run so many models?

At the heart of the platform is our custom inference stack, in which we can dynamically swap out models on the fly in <1 second for a 10B model.

This allows us to rapidly reconfigure our infrastructure according to user workload and autoscale accordingly, as a single unified unit.

Why would I not just use Hugging Face or etc. directly?

Cost, speed, and customization

While Hugging Face and RunPod let you run any model, they charge $1 per hour or higher for the GPUs. If you plan on using models for over five hours consistently, using our platform is likely the more affordable option.

On the flip side, other providers may offer a limited list of models to optimize for cost and speed. But they may not have the model you want.

Do you have a referral program?

Yes! Refer a friend, and when they subscribe and add your email, both of you get $10 OFF your next monthly bill!

Refer 12 of your friends and you can have a full year off our basic plan! (The discount stacks!)

Details here.

Some of your models don't have open licenses (e.g. CC-BY-NC); how can these be listed on your site?

We are a serverless AI hosting provider. Our product simplifies the process of deploying models from Hugging Face. We are not charging for the underlying models.

We list all supported models, any of which can be made available for inference in milliseconds. We interpret all API requests as model allocation requests and "deploy" the underlying model automatically. This is analogous to how an individual would use RunPod, but at a different scale.

Moreover, we are in contact with model creators of the most popular models to avoid misunderstandings about the nature of Featherless, and have obtained their permission. If you are a model creator and take issue with a listing here, please contact us at hello@featherless.ai.

Support Open Source AI!

The Recursal AI team's primary goal is to make Open Source AI accessible to everyone, regardless of language or economic status.

That's why we work actively to build open source AI models with the RWKV community that are multi-lingual, highly scalable, and affordable.
The revenue we make from Featherless AI helps support our work in training and scaling such models. You can see our latest open source AI model here.

Also, we encourage you to reach out to your favorite model creators to support them directly as well.