New
Announcing Featherless' Realtime API Beta Learn more

Instant, unlimited hosting for any Llama model on HuggingFace.
No servers needed.

Over 4100+ compatible models to choose from.
Starting from $10/month.

Most Popular Models

The most popular models on the platform in the last 2 weeks.

llama33-70b-16k
Sao10K/L3.3-70B-Euryale-v2.3
#1
Warm
62
1,027
deepseek-v3-lc
deepseek-ai/DeepSeek-R1
#2
Warm
11,603
1,523,396
mistral-nemo-12b-lc
mistralai/Mistral-Nemo-Instruct-2407
#3
Warm
1,497
274,173
qwen25-32b-lc
EVA-UNIT-01/EVA-Qwen2.5-32B-v0.2
#4
Warm
53
627
mistral-nemo-12b-lc
GalrionSoftworks/MN-LooseCannon-12B-v1
#5
Warm
8
528
llama33-70b-16k
Steelskull/L3.3-Electra-R1-70b
#6
Warm
29
3,536
mistral-nemo-12b-lc
MarinaraSpaghetti/NemoMix-Unleashed-12B
#7
Warm
198
5,249
mistral-nemo-12b-lc
ProdeusUnity/Stellar-Odyssey-12b-v0.0
#8
Warm
12
252
llama31-70b-16k
meta-llama/Meta-Llama-3.1-70B-Instruct
#9
Warm
800
1,126,405
llama31-70b-16k
nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
#10
Warm
2,027
258,799
llama33-70b-16k
EVA-UNIT-01/EVA-LLaMA-3.33-70B-v0.0
#11
Warm
30
548
mistral-24b-lc
PocketDoc/Dans-PersonalityEngine-V1.2.0-24b
#12
Warm
61
3,930
mistral-nemo-12b-lc
TheDrummer/Rocinante-12B-v1.1
#13
Warm
97
2,100
qwen2-72b-lc
anthracite-org/magnum-v4-72b
#14
Warm
41
2,477
llama33-70b-16k
TheDrummer/Fallen-Llama-3.3-R1-70B-v1
#15
Warm
39
2,354
llama31-70b-16k
NeverSleep/Lumimaid-v0.2-70B
#16
Warm
39
235
mistral-nemo-12b-lc
nothingiisreal/MN-12B-Celeste-V1.9
#17
Warm
139
317
mistral-nemo-12b-lc
Infermatic/MN-12B-Inferor-v0.0
#18
Warm
11
276
llama33-70b-16k
Steelskull/L3.3-Nevoria-R1-70b
#19
Warm
69
459
llama33-70b-16k
Steelskull/L3.3-Cu-Mai-R1-70b
#20
Warm
12
1,670

Trending this Week

The currently trending models that are growing in popularity this week.

llama33-70b-16k
Sao10K/Llama-3.3-70B-Vulpecula-r1
#1
Warm
18
575
llama33-70b-16k
Black-Ink-Guild/Pernicious_Prophecy_70B
#2
Warm
13
1,576
llama33-70b-16k
Tarek07/Dungeonmaster-V2.4-Expanded-LLaMa-70B
#3
Warm
4
104
qwen25-32b-lc
Delta-Vector/Hamanasu-QwQ-V2-RP
#4
Warm
4
199
qwen25-32b-lc
Delta-Vector/Hamanasu-Magnum-QwQ-32B
#5
Warm
5
246
mistral-24b-lc
ReadyArt/Forgotten-Abomination-24B-v4.0
#6
Warm
3
31
llama33-70b-16k
Tarek07/Dungeonmaster-V2.2-Expanded-LLaMa-70B
#7
Warm
1
178
mistral-24b-lc
ReadyArt/Forgotten-Safeword-24B-v4.0
#8
Warm
3
265
qwrkv-72b-32k
featherless-ai/Qwerky-72B
#9
Warm
4
37
llama33-70b-16k
Strangedove/Black-Ink-Guild_Pernicious_Prophecy_70B-EmbedFix
#10
Warm
1
77
qrwkv-32b-32k
featherless-ai/Qwerky-QwQ-32B
#11
Warm
5
91
llama33-70b-16k
TareksLab/L3.3-TRP-BASE-80-70B
#12
Warm
3
261
mistral-24b-lc
mistralai/Mistral-Small-3.1-24B-Instruct-2503
#13
Loading
924
60,430
mistral-24b-lc
Gryphe/Pantheon-RP-1.8-24b-Small-3.1
#14
Warm
29
359
mistral-nemo-12b-lc
yamatazen/EtherealAurora-12B
#15
Warm
6
542
llama3-70b-8k
Dogge/llama-3-70B-instruct-uncensored
#16
Warm
10
56
llama31-70b-16k
CYFRAGOVPL/Llama-PLLuM-70B-instruct
#17
Warm
3
1,965
qwen25-32b-lc
Sao10K/32B-Qwen2.5-Kunou-v1
#18
Warm
33
222
llama31-70b-16k
Strangedove/CYFRAGOVPL_Llama-PLLuM-70B-instruct-EmbedFix
#19
Warm
0
47
qwen25-32b-lc
maldv/Loqwqtus2.5-32B-Instruct
#20
Warm
2
54

Latest Models

These are the most recently available models in Featherless

mistral-24b-lc
Gryphe/Pantheon-RP-1.8-24b-Small-3.1
Warm
29
359
llama33-70b-16k
TareksGraveyard/Grandiloquence-LLaMa-70B
Warm
0
21
llama33-70b-16k
Tarek07/Primogenitor-V2.1-LLaMa-70B
Warm
1
37
llama33-70b-16k
Nexesenex/Llama_3.x_70b_Evasion_V1
Warm
1
34
mistral-nemo-12b-lc
cgato/Nemo-12b-Humanize-KTO-v0.1
Loading
20
91
llama31-8b-16k
jdineen/Llama-3.1-8B-Think
Cold
0
449
qwen2-7b-lc
zijianh/DeepSeek-R1-Distill-Qwen-7B-RL-length-penalty-low-new
Cold
0
160
qwen25-7b-lc
yhkim9362/Qwen2.5-7B-Instruct-ko-lora-alpa-namu-cm
Cold
0
105
mistral-nemo-12b-lc
riple-saanvi-lab/Saanvi-C0-12B
Cold
1
141
llama3-8b-8k
AliMaatouk/LLama-3-8B-Tele
Cold
2
110
mistral-24b-lc
ZeroAgency/Zero-Mistral-Small-3.1-24B-Instruct-2503-beta
Cold
0
234
mistral-24b-lc
clembench-playpen/Mistral-Small-24B-Instruct-2501_playpen_SFT_merged_fp16_DFINAL_0.6K-steps
Cold
0
551
qwen25-7b-lc
yhkim9362/Qwen2.5-7B-Instruct-ko-lora-koalpaca-namuwiki-2epochs
Warm
0
130
llama31-8b-16k
UNIVA-Bllossom/DeepSeek-llama3.1-Bllossom-8B
Cold
41
8,708
llama2-13b-4k
wxgeorge/undi95-remm-slerp-bf16
Warm
0
5
qwen25-7b-lc
mlfoundations-dev/bespokelabs_Bespoke-Stratos-17k_Qwen_Qwen2.5-7B-Instruct_reasoning
Cold
0
216
llama31-8b-16k
clembench-playpen/llama-3.1-8B-Instruct_playpen_SFT_DFINAL_0.7K-steps_merged_fp16
Cold
0
1,282
qwen25-7b-lc
ZMC2019/Qwen7B-Roll-L28E3
Cold
0
145
qwen2-32b-lc
moogician/DSR1-Qwen-32B-still
Warm
0
110
llama3-8b-8k
hlillemark/llama3_8b_sft_mc
Cold
0
105

Trusted by developers at

Hugging Face
Coframe.com
Elevenlabs
Latitude.io
lightspeed
Resaro.ai
Dali.games
Alterhq.com

Simple Pricing Unlimited Tokens

Feather Basic

Max. 15B

$ 10 USD / Month

  • Use any model up to 15B in size subject to Personal Use limits*
  • Private, secure, and anonymous usage - no logs
*Personal Use is a maximum of 2 concurrent requests.

Feather Premium

All Models

$ 25 USD / Month

  • Use any model (including DeepSeek R1!) subject to the following limits
  • max 4 concurrent requests to models <= 15B, or
  • max 2 concurrent requests to models <= 34B, or
  • max 1 concurrent connection to any model >= 70B
  • or any linear combination of the above
Need more concurrent requests?

Feather Scale

Max. 72B

$ 75 USD / Month

  • Business plan that can scale to arbitrarily many concurrent connections
  • Each scale unit allows for
  • 4 concurrent requests to models <= 15B, or
  • 2 concurrent requests to models <= 34B, or
  • 1 concurrent connection to any model <= 72B, or
  • a linear combination of the above
  • + deploy your own private models!
Concurrency is scaled based on quantity of the selected plan. DeepSeek R1 currently excluded

How many concurrencies do you need?

250
2× Premium Models
or 8× Basic Models

Feather Enterprise

Custom

  • Run your entire catalog.
  • From your cloud.
  • With reduced GPUs.

Frequently Asked Questions

What is Featherless?

Featherless is an LLM hosting provider that enables the use of a continually expanding library of models via the simplest possible API.

Forget about servers or per-token pricing and just use models via API.

Our model catalog is the largest of any single provider on the internet (by over 10x).

What does it cost?

We have plans for both individuals and businesses.

Individual plans cost $25 per month for any model, or $10 for small models only (up to 15B).

Business plans start at $75 per month and scale to hundreds of concurrent connections.

Are my chats logged?

No. We do not log your chats - not the prompts, not the completions.

We do log meta data - e.g. which models are used, what prompt lengths - as this is necessary to monitor and scale our infrastructure.

Please see our privacy policy for more details.

Which models are supported?

We support a wide range of model families - Llama2, Llama3, Mistral, Qwen, DeepSeek and RWKV.

Please see our model page for the complete list

Can I request for a model to be added?

Yes! Please reach out to us on Discord or email us

As we grow, we aim to automate this process to encompass all publicly available Hugging Face models with compatible architectures.

You say unlimited tokens - how unlimited is unlimited?

We price by concurrency, and not tokens, so your bill is predictable no matter how much you are using the service.
As long as you remain subscribed, there's no time cap on model usage.

How fast is your service?

Output is delivered at a speed of 10-40 tokens per second, depending on the model and prompt size.

How can I contact the team for support?

Questions about model use and sampler settings are best asked in our Discord.

For account or billing issues, please email us.

Are you running quantized models?

Yes, we run all models at FP8 quantization to balance quality, cost and throughput speed.

We've found this quantization does not noticeably change model output quality, while significantly improving inference speeds.

Are you really running all these models?

At the heart of the platform is our custom inference stack, in which we can dynamically swap out models on the fly in <1 second for a 10B model.

This allows us to rapidly reconfigure our infrastructure according to user workload and autoscale accordingly, as a single unified unit.

Why would I choose Featherless over RunPod or etc. directly?

Cost, speed, and customization

While cloud GPU providers like RunPod allow you run any model, cost of the GPUs is significant (minimum $2/hour for GPUs to run a 70B model).

There are services that abstract the model management (e.g. Replicate or Open Router), but these offer a much more limited array of models.

Featherless offers the complete range of models with none of the complexity or cost of managing servers or GPUs.

Do you have a referral program?

Yes! Refer a friend, and when they subscribe and add your email, both of you get $10 OFF your next monthly bill!

Refer 12 of your friends and you can have a full year off our basic plan! (The discount stacks!)

Details here.

Some models don't have open licenses (e.g. CC-BY-NC); how can these be listed on your site?

We are a serverless AI hosting provider. Our product simplifies the process of deploying models from Hugging Face. We are not charging for the underlying models.

We list all supported models, any of which can be made available for inference in milliseconds. We interpret all API requests as model allocation requests and "deploy" the underlying model automatically. This is analogous to how an individual would use RunPod, but at a different scale.

Moreover, we are in contact with model creators of the most popular models to avoid misunderstandings about the nature of Featherless, and have obtained their permission. If you are a model creator and take issue with a listing here, please contact us at [email protected].