Instant, unlimited hosting for any llama model on HuggingFace.

Over 4200+ compatible models to choose from. Starting from $10/month. No server needed.

Hero Image

Our Models

We provide inference via API to a continually expanding library of open-weight models, including the most popular models for role-playing, creative writing, coding assistance, and more.

Leaderboard
1
2
3
4
1
2
3
4
1
2
3
4

Ways to Use Featherless

Coding

Coding

Agents

Agents

Chat & Roleplay

Chat & Roleplay

Assistants

Assistants

Creative Writing

Creative Writing

Custom Applications

Custom Applications

Why Featherless?

Featherless is a serverless AI inference provider with unique model loading and GPU orchestration abilities that makes an exceptionally large catalog of models available for users. Other providers either offer low cost of access (e.g. openrouter, AWS bedrock) but with a limited set of models, or an unlimited range of models (e.g. runpod) but with users managing servers and the associated costs of operation (e.g. > $2/hour to run a 70B).

Featherless provides the best of both worlds offering unmatched model range and variety but with serverless pricing.

ProviderCostSpeedChoice
RunPod(thousands)
HuggingFace(thousands)
Anthropic(<10 models)
OpenRouter(~200 models)
Featherless
Grid Background

Simple Pricing Unlimited Tokens

Feather Basic

Max. 15B

$ 10 USD / Month

  • Use any model up to 15B in size subject to Personal Use limits*
  • Private, secure, and anonymous usage - no logs
*Personal Use is a maximum of 2 concurrent requests.

Feather Premium

All Models

$ 25 USD / Month

  • Use any model (including DeepSeek R1!) subject to the following limits
  • max 4 concurrent requests to models <= 15B, or
  • max 2 concurrent requests to models <= 34B, or
  • max 1 concurrent connection to any model >= 70B
  • or any linear combination of the above
Need more concurrent requests?

Feather Scale

Max. 72B

$ 75 USD / Month

  • Business plan that can scale to arbitrarily many concurrent connections
  • Each scale unit allows for
  • 4 concurrent requests to models <= 15B, or
  • 2 concurrent requests to models <= 34B, or
  • 1 concurrent connection to any model <= 72B, or
  • a linear combination of the above
  • + deploy your own private models!
Concurrency is scaled based on quantity of the selected plan. DeepSeek R1 currently excluded

How many concurrencies do you need?

250
2× Premium Models
or 8× Basic Models

Feather Enterprise

Custom

  • Run your entire catalog.
  • From your cloud.
  • With reduced GPUs.

Frequently Asked Questions

What is Featherless?

Featherless is an LLM hosting provider that offers our subscribers access to a continually expanding library of HuggingFace models.
Featherless: Less hassle, less effort. Start now.

Do you log my chat history?

No. We do not log any of the prompts or completions sent to our API.

Which model architectures are supported?

Our goal is to provide serverless inference for all models on Hugging Face. We currently support a wide range of llama models including Llama 2 and 3, Mistral, Qwen and Deep Seek. For more details see https://featherless.ai/docs/model-compatibility.

How do I get models added?

Business customers can deploy models through their dashboard. Users on individual plans can request either on discord or by emailing [email protected].