Instant, unlimited hosting for any Llama model on HuggingFace.
No servers needed.

Over 2200+ compatible models to choose from.
Starts from just $10/month.

Most Popular Models

The most popular models on the platform in the last 2 weeks.

qwen2-72b-lc
anthracite-org/magnum-v2-72b
#1
Warm
23
1,575
llama31-70b-16k
Sao10K/L3.1-70B-Hanami-x1
#2
Warm
18
549
llama2-13b-4k
PygmalionAI/mythalion-13b
#3
Warm
141
1,567
mistral-nemo-12b-lc
TheDrummer/Rocinante-12B-v1.1
#4
Warm
53
470
mistral-nemo-12b-lc
MarinaraSpaghetti/NemoMix-Unleashed-12B
#5
Warm
95
5,090
mistral-nemo-12b-lc
inflatebot/MN-12B-Mag-Mell-R1
#6
Warm
6
106
llama31-70b-16k
Sao10K/L3.1-70B-Euryale-v2.2
#7
Warm
44
540
qwen2-72b-lc
alpindale/magnum-72b-v1
#8
Warm
4
286
llama31-70b-16k
crestf411/L3.1-70B-sunfall-v0.6.1
#9
Warm
6
151
llama3-70b-8k
Sao10K/L3-70B-Euryale-v2.1
#10
Warm
112
1,178
qwen2-72b-lc
Qwen/Qwen2.5-72B-Instruct
#11
Warm
261
22,186
mixtral-8x22b-lc
alpindale/WizardLM-2-8x22B
#12
Warm
380
3,317
mistral-nemo-12b-lc
nothingiisreal/MN-12B-Starcannon-v3
#13
Warm
7
419
mistral-nemo-12b-lc
ArliAI/ArliAI-RPMax-12B-v1.1
#14
Warm
7
36
mistral-nemo-12b-lc
anthracite-org/magnum-v2-12b
#15
Warm
70
1,432
mistral-nemo-12b-lc
Sao10K/MN-12B-Lyra-v4
#16
Warm
19
355
mistral-nemo-12b-lc
nbeerbower/Lyra-Gutenberg-mistral-nemo-12B
#17
Warm
14
172
mistral-nemo-12b-lc
unsloth/Mistral-Nemo-Instruct-2407
#18
Warm
5
2,060
llama3-8b-8k
Sao10K/L3-8B-Stheno-v3.2
#19
Warm
212
2,623

Trending this Week

The currently trending models that are growing in popularity this week.

llama31-70b-16k
hf-100/Llama-3.1-Spellbound-StoryWriter-70b-instruct-0.4-16bit
#1
Warm
0
213
qwen2-72b-lc
Qwen/Qwen2.5-72B
#2
Warm
27
2,406
mistral-nemo-12b-lc
nbeerbower/Lyra4-Gutenberg-12B
#3
Warm
15
145
mistral-nemo-12b-lc
Vikhrmodels/Vikhr-Nemo-12B-Instruct-R-21-09-24
#4
Warm
49
766
llama3-8b-8k
abubakarilyas624/360factors_pretrained_100example_2
#5
Cold
0
361
mistral-nemo-12b-lc
Epiculous/Violet_Twilight-v0.2
#6
Warm
6
122
llama31-8b-16k
ModelsLab/Llama-3.1-8b-Uncensored-Dare
#7
Warm
0
11,079
llama3-15b-8k
elinas/Llama-3-15B-Instruct-zeroed
#8
Warm
2
3,141
llama2-13b-4k
lmsys/vicuna-13b-v1.5
#9
Warm
208
129,143
llama3-15b-8k
elinas/Llama-3-15B-Instruct-zeroed-ft
#10
Warm
2
10
qwen2-72b-lc
Qwen/Qwen2.5-Math-72B-Instruct
#11
Warm
9
711
llama31-8b-16k
Vikhrmodels/Vikhr-Llama3.1-8B-Instruct-R-21-09-24
#12
Warm
19
453
llama2-13b-4k
BlueNipples/TimeCrystal-l2-13B
#13
Warm
15
602
llama31-8b-16k
Magpie-Align/MagpieLM-8B-SFT-v0.1
#14
Cold
3
630
mistral-v02-7b-std-lc
Epiculous/Mika-7B
#15
Warm
8
69
llama2-13b-4k
Gryphe/MythoMax-L2-13b
#16
Warm
254
12,863
mistral-v02-7b-std-lc
YorkieOH10/MistralHermesPipe-7B-slerp
#17
Warm
0
8
llama31-8b-16k
Magpie-Align/MagpieLM-8B-Chat-v0.1
#18
Cold
18
154
mistral-v02-7b-std-lc
ZySec-AI/ZySec-7B
#19
Warm
0
14
mistral-v02-7b-std-lc
nvidia/OpenMath-Mistral-7B-v0.1-hf
#20
Warm
28
156

Latest Models

These are the most recently available models in Featherless

mistral-nemo-12b-lc
AuriAetherwiing/MN-12B-Starcannon-v2
Cold
20
18,364
llama3-70b-8k
rinna/llama-3-youko-70b
Cold
1
165
llama3-8b-8k
princeton-nlp/Llama-3-Instruct-8B-RDPO-v0.2
Cold
0
344
llama3-8b-8k
princeton-nlp/Llama-3-Instruct-8B-ORPO-v0.2
Cold
0
355
llama3-8b-8k
princeton-nlp/Llama-3-Instruct-8B-KTO-v0.2
Cold
0
388
llama3-8b-8k
princeton-nlp/Llama-3-Instruct-8B-CPO-v0.2
Cold
0
341
llama3-8b-8k
princeton-nlp/Llama-3-Instruct-8B-CPO
Cold
0
340
llama3-8b-8k
FreedomIntelligence/AceGPT-v2-8B
Cold
0
119
llama31-8b-16k
Ja-ck/KoMultiGen-General-Llama3-8B
Cold
3
146
llama3-8b-8k
princeton-nlp/Llama-3-Base-8B-SFT-KTO
Cold
0
102
llama3-8b-8k
AgentPublic/llama3-instruct-guillaumetell
Cold
0
104
mistral-v02-7b-std-lc
PIXMELT/Mistral-7B-Instruct-v0.2
Cold
0
718
llama2-13b-4k
meta-llama/Llama-2-13b-chat-hf
Warm
1,015
641,168
mistral-v02-7b-std-lc
Qwen/Qwen2.5-Coder-7B
Loading
27
11,779
mistral-v02-7b-std-lc
mistralai/Mistral-7B-Instruct-v0.2
Loading
2,525
899,657
llama31-8b-16k
KONIexp/v3_1_pt_ep1_sft_5_based_on_llama3_1_8b_last_data_20240921
Cold
0
1,949
llama31-8b-16k
NTIS/merge_v4.1
Cold
0
541
llama31-8b-16k
KONIexp/v3_1_pt_ep1_sft_5_based_on_llama3_1_8b_50_per_data_20240918
Cold
0
273
llama31-8b-16k
traversaal-llm-regional-languages/Unsloth_Urdu_Llama3_1_FP16_PF100
Cold
0
263
llama31-8b-16k
Saxo/Linkbricks-Horizon-AI-Nous-Hermes-3-Llama3.1-Korean-cpt-8b
Cold
0
1,254

Trusted by developers at

Hugging Face
Coframe.com
Elevenlabs
Latitude.io
lightspeed
Resaro.ai
Dali.games
Alterhq.com

Simple Pricing Unlimited Tokens

Feather Basic

Max. 15B

$ 10 USD / Month

  • Unlimited Personal Use*
  • Ever growing list of community models
  • New models added weekly
  • Access to all models in the Feather ecosystem every month
  • Private, secure, and anonymous usage - no logs
*Personal Use is a maximum of two concurrent requests. If you need more, please contact us.

Feather Premium

Max. 72B

$ 25 USD / Month

  • All the benefits of Feather Basic
  • Up to 72B models*
*70B & 72B are limited to a maximum of one concurrent requests. If you need more, please contact us.
Need more concurrent requests?

Feather Scale

Max. 72B

$ 75 USD / Month

  • All the benefits of Feather Premium
  • Scalable pricing to meet your needs
  • Host your own private models from Hugging Face
Concurrency is scaled based on quantity of the selected plan.

How many concurrencies do you need?

220
2× Premium Models
or 6× Basic Models

Feather Enterprise

Custom

  • Run your entire catalog.
  • From your cloud.
  • With reduced GPUs.

Frequently Asked Questions

What is Featherless?

Featherless is an LLM hosting provider that offers our subscribers access to a continually expanding library of Hugging Face models.

Featherless: Less hassle, less effort. Start now.

What does it cost?

We offer two pricing plans at $10 and $25 a month.

If the concurrency limits are too restrictive for genuine personal use, please reach out to us via our Discord.

Are my logs stored?

No. As a privacy-focused service, we do not log any of your chats, prompts, or completions. Please see our privacy policy for more details.

Which model architectures are supported?

At present, we support LLaMA-3-based models, including LLaMA-3 and QWEN-2.

Note that QWEN-2 models are only supported up to 16,000 context length.

We plan to add more architectures to our supported list soon.

How do I get new models added?

Ping us on our Discord.

We continuously onboard new models as they become available on Hugging Face.

As we grow, we aim to automate this process to encompass all publicly available Hugging Face models with compatible architectures.

How unlimited is unlimited?

As long as you remain subscribed, there's no time cap on model usage.

To ensure fair individual account use, concurrent requests are limited according to the plan you've selected.

Output is delivered at a speed of 10-40 tokens per second, depending on the model and prompt size.

How can I get in touch?

Join our Discord or find us on r/SillyTavernAI.

Are you running quantized models?

Yes, we use FP8 quantization.

After consulting with the community, we've found that this approach maintains output quality while significantly improving inference speeds.

How are you able to run so many models?

At the heart of the platform is our custom inference stack, in which we can dynamically swap out models on the fly in <1 second for a 10B model.

This allows us to rapidly reconfigure our infrastructure according to user workload and autoscale accordingly, as a single unified unit.

Why would I not just use Hugging Face or etc. directly?

Cost, speed, and customization

While Hugging Face and RunPod let you run any model, they charge $1 per hour or higher for the GPUs. If you plan on using models for over five hours consistently, using our platform is likely the more affordable option.

On the flip side, other providers may offer a limited list of models to optimize for cost and speed. But they may not have the model you want.

Do you have a referral program?

Yes! Refer a friend, and when they subscribe and add your email, both of you get $10 OFF your next monthly bill!

Refer 12 of your friends and you can have a full year off our basic plan! (The discount stacks!)

Details here.

Some of your models don't have open licenses (e.g. CC-BY-NC); how can these be listed on your site?

We are a serverless AI hosting provider. Our product simplifies the process of deploying models from Hugging Face. We are not charging for the underlying models.

We list all supported models, any of which can be made available for inference in milliseconds. We interpret all API requests as model allocation requests and "deploy" the underlying model automatically. This is analogous to how an individual would use RunPod, but at a different scale.

Moreover, we are in contact with model creators of the most popular models to avoid misunderstandings about the nature of Featherless, and have obtained their permission. If you are a model creator and take issue with a listing here, please contact us at [email protected].

Support Open Source AI!

The Recursal AI team's primary goal is to make Open Source AI accessible to everyone, regardless of language or economic status.

That's why we work actively to build open source AI models with the RWKV community that are multi-lingual, highly scalable, and affordable.
The revenue we make from Featherless AI helps support our work in training and scaling such models. You can see our latest open source AI model here.

Also, we encourage you to reach out to your favorite model creators to support them directly as well.