Instant, unlimited hosting for any llama model on HuggingFace.
Over 4200+ compatible models to choose from. Starting from $10/month. No server needed.

Our Models
We provide inference via API to a continually expanding library of open-weight models, including the most popular models for role-playing, creative writing, coding assistance, and more.
Ways to Use Featherless

Coding

Agents

Chat & Roleplay

Assistants

Creative Writing

Custom Applications
Why Featherless?
Featherless is a serverless AI inference provider with unique model loading and GPU orchestration abilities that makes an exceptionally large catalog of models available for users. Other providers either offer low cost of access (e.g. openrouter, AWS bedrock) but with a limited set of models, or an unlimited range of models (e.g. runpod) but with users managing servers and the associated costs of operation (e.g. > $2/hour to run a 70B).
Featherless provides the best of both worlds offering unmatched model range and variety but with serverless pricing.
Provider | Cost | Speed | Choice |
---|---|---|---|
RunPod | (thousands) | ||
HuggingFace | (thousands) | ||
Anthropic | (<10 models) | ||
OpenRouter | (~200 models) | ||
Featherless | ![]() |
Simple Pricing Unlimited Tokens
Feather Basic
Max. 15B
$ 10 USD / Month
- Use any model up to 15B in size subject to Personal Use limits*
- Private, secure, and anonymous usage - no logs
Feather Premium
All Models
$ 25 USD / Month
- Use any model (including DeepSeek R1!) subject to the following limits
- max 4 concurrent requests to models <= 15B, or
- max 2 concurrent requests to models <= 34B, or
- max 1 concurrent connection to any model >= 70B
- or any linear combination of the above
Feather Scale
Max. 72B
$ 75 USD / Month
- Business plan that can scale to arbitrarily many concurrent connections
- Each scale unit allows for
- 4 concurrent requests to models <= 15B, or
- 2 concurrent requests to models <= 34B, or
- 1 concurrent connection to any model <= 72B, or
- a linear combination of the above
- + deploy your own private models!
How many concurrencies do you need?
Feather Enterprise
Custom
- Run your entire catalog.
- From your cloud.
- With reduced GPUs.
Frequently Asked Questions
What is Featherless?
Featherless is an LLM hosting provider that offers our subscribers access to a continually expanding library of HuggingFace models.
Featherless: Less hassle, less effort. Start now.
Are my logs stored?
No, we prioritize your privacy. Logs are only temporarily processed during execution and are not permanently stored on our servers unless you explicitly opt-in to improve our services.
Which model architectures are supported?
We support a wide range of architectures including Transformer-based models (BERT, GPT, T5), diffusion models, and more. Our platform is constantly expanding to include the latest architectures from HuggingFace.
How do I get new models added?
You can request new models through your dashboard or by contacting our support team. We regularly evaluate and add new models based on user requests and their popularity in the AI community.