Largest AI inference access to 11,800+ open source models

Instantly deploy at scale for fine-tuning, testing, and production with unlimited tokens.

Trusted by AI teams worldwide

Explore and access models instantly

We provide inference via API to a continually expanding library of open-weight models, including the most popular models for coding assistance, deep research, creative writing, and more.

See All Models

How people use Featherless

OpenHands

OpenHands is an open source AI software development platform to streamline sofware development by automating coding tasks using intelligent agents. Developers can now focus on more complex challenges teaming up with AI supported by Featherless. See how to get started in this guide.

Novelcrafter

NovelCrafter is an AI-powered writing platform designed to assist authors throughout the entire novel-writing process, from initial brainstorming to final edits. You can level up your creative writing with any model from Featherless extensive catalog, from ones that are known for poetic prose to specialized ones in dialogue or vast world knowledge.

WyvernChat

WyvernChat is a user-first AI chat app with sleek UX and consistent content policy. Finding the right model isn't simply a technical choice; it's giving life to your character within unique identity and personality. Featherless has built-in support into WyvernChat so you can make use of our growing catalog of open source models for your favorite characters and creative writing.

LangChain

LangChain is one of the most widely adopted libraries that offer developers powerful tools to manage complex prompts and conversational state. With our OpenAI SDK compatibility you can power your applications with Featherless and our catalog of open models. See the docs for LangChain and LiteLLM.

Why Featherless?

Featherless is a serverless inference provider offering advanced model loading and GPU orchestration capabilities. Access our extensive catalog of thousands of models without the burden of server management or operational overhead. Our transparent billing structure is predictable, ensuring no unexpected costs.

Provider	Cost	Speed	Choice
RunPod			(thousands)
HuggingFace			(thousands)
Anthropic			(<10 models)
OpenRouter			(~200 models)
Featherless

Flat pricing with unlimited tokens

Feather Basic

$10.00/month

Access to models up to 15B

Up to 2 concurrent connections

Up to 16K context

Regular speed

Feather Premium

$25.00/month

Access to DeepSeek R1

Access any model - no limit on size!

Up to 4 concurrent connections

Up to 16K context

Regular speed

Feather Scale

$75.00/month

Business plan that can scale to arbitrarily many concurrent connections

Each scale unit allows for:

8 concurrent requests to models less than or equal to 15B, or

4 concurrent requests to models less than or equal to 34B, or

2 concurrent requests to models less than or equal to 72B, or

a linear combination of the above

Private, secure, and anonymous usage - no logs

Concurrency is scaled based on quantity of the selected plan. DeepSeek R1 and V3 currently excluded

How many concurrencies do you need?

For enterprise, you can run your own catalog on us from your cloud with reduced GPU.

See Details.

About Featherless AI

Featherless AI is an AI research lab, pioneering open-source, post-transformer model research and AI commercialization.

We created the world's largest AI model without transformer attention at a 1,000x cheaper inference

We cut down AI architecture validation cost for 70B class model by 95%+ (from $5M to $50k)

We built the world's most reliable AI agent in web task, beating Gemini, Claude 4, GPT 4o (product in progress)

We reduce inference cost by over 10x for all AI models, offering flat monthly pricing for unlimited tokens

Frequently Asked Questions

What is Featherless?

Featherless is an LLM hosting provider that offers our subscribers access to a continually expanding library of HuggingFace models.
Featherless: Less hassle, less effort. Start now.

Do you log my chat history?

No. We do not log any of the prompts or completions sent to our API.

Which model architectures are supported?

Our goal is to provide serverless inference for all models on Hugging Face. We currently support a wide range of llama models including Llama 2 and 3, Mistral, Qwen and Deep Seek. For more details see https://featherless.ai/docs/model-compatibility.

How do I get models added?

Business customers can deploy models through their dashboard. Users on individual plans can request either on discord or by emailing [email protected].