Instant, unlimited hosting for any llama model on HuggingFace.

Over 4200+ compatible models to choose from. Starting from $10/month. No server needed.

Hero Image

Our Models

We provide inference via API to a continually expanding library of open-weight models, including the most popular models for role-playing, creative writing, coding assistance, and more.

Leaderboard
1
2
3
4
1
2
3
4
1
2
3
4

Ways to Use Featherless

Coding

Coding

Agents

Agents

Chat & Roleplay

Chat & Roleplay

Assistants

Assistants

Creative Writing

Creative Writing

Custom Applications

Custom Applications

Why Featherless?

Featherless is a serverless AI inference provider with unique model loading and GPU orchestration abilities that makes an exceptionally large catalog of models available for users. Other providers either offer low cost of access (e.g. openrouter, AWS bedrock) but with a limited set of models, or an unlimited range of models (e.g. runpod) but with users managing servers and the associated costs of operation (e.g. > $2/hour to run a 70B).

Featherless provides the best of both worlds offering unmatched model range and variety but with serverless pricing.

ProviderCostSpeedChoice
RunPod(thousands)
HuggingFace(thousands)
Anthropic(<10 models)
OpenRouter(~200 models)
Featherless
Grid Background

Simple Pricing Unlimited Tokens

Feather Basic

Max. 15B

$ 10 USD / Month

  • Use any model up to 15B in size subject to Personal Use limits*
  • Private, secure, and anonymous usage - no logs
*Personal Use is a maximum of 2 concurrent requests.

Feather Premium

All Models

$ 25 USD / Month

  • Use any model (including DeepSeek R1!) subject to the following limits
  • max 4 concurrent requests to models <= 15B, or
  • max 2 concurrent requests to models <= 34B, or
  • max 1 concurrent connection to any model >= 70B
  • or any linear combination of the above
Need more concurrent requests?

Feather Scale

Max. 72B

$ 75 USD / Month

  • Business plan that can scale to arbitrarily many concurrent connections
  • Each scale unit allows for
  • 4 concurrent requests to models <= 15B, or
  • 2 concurrent requests to models <= 34B, or
  • 1 concurrent connection to any model <= 72B, or
  • a linear combination of the above
  • + deploy your own private models!
Concurrency is scaled based on quantity of the selected plan. DeepSeek R1 currently excluded

How many concurrencies do you need?

250
2× Premium Models
or 8× Basic Models

Feather Enterprise

Custom

  • Run your entire catalog.
  • From your cloud.
  • With reduced GPUs.

Frequently Asked Questions

What is Featherless?

Featherless is an LLM hosting provider that offers our subscribers access to a continually expanding library of HuggingFace models.
Featherless: Less hassle, less effort. Start now.

Are my logs stored?

No, we prioritize your privacy. Logs are only temporarily processed during execution and are not permanently stored on our servers unless you explicitly opt-in to improve our services.

Which model architectures are supported?

We support a wide range of architectures including Transformer-based models (BERT, GPT, T5), diffusion models, and more. Our platform is constantly expanding to include the latest architectures from HuggingFace.

How do I get new models added?

You can request new models through your dashboard or by contacting our support team. We regularly evaluate and add new models based on user requests and their popularity in the AI community.