Featherless Becomes Hugging Face’s Largest LLM Inference Provider with 6,700+ Models
Unlimited token pricing and instant access now power the future of open-source AI deployment.

We’re excited to announce that Featherless is now the most extensive LLM inference provider on Hugging Face, serving over 6,700 open-weight models—and counting.
This milestone means developers, researchers, and teams can now run thousands of the world’s models directly from Hugging Face, backed by Featherless's serverless infrastructure, flat pricing, and production-grade scalability.
Featherless is the only Hugging Face Inference Endpoints provider supporting this scale.
Any model with 100+ downloads is automatically onboarded to Featherless.
Reliable, Open AI — At Scale
This collaboration brings together two shared commitments: accessibility and open source.
With Featherless powering Hugging Face endpoints, users now get:
6,700+ Models, Instantly Available
From DeepSeek, LLaMA, Mistral, and Qwen to new release like Magistral and Devstral. All ready to deploy, fine-tune, or benchmark.
Serverless, Scalable Infrastructure
Model cold-starts average under 250ms, enabling users to plan their usage by models and concurrent connections. No GPUs, no containers, no infrastructure.
Automatic Model Onboarding
Hugging Face models with 100+ downloads are auto-integrated with Featherless for access.
Unlimited Usage, Predictable Pricing (when subscribed to Featherless)
Run any model without usage caps, per-token math, or surprise bills.
“Featherless AI is doing for inference what Hugging Face did for open-source model hosting, making it simple, accessible, and scalable. This partnership is a big step towards the future where anyone can have instant access to all the world’s collection of AI models.”
— Eugene Cheah, Co-founder, Featherless AI
Two Ways to Use Featherless on Hugging Face
Starting June 12, 2025, users can invoke Featherless inference directly inside the Hugging Face platform:
Routed Request
Billed by Hugging Face. Just select Featherless AI from the Inference Endpoints dropdown and go.
Custom Key or Direct Calls
Use your own Featherless API key for direct access and flat-rate unlimited usage (requires a Featherless subscription).
→ Read the Docs
→ Explore Featherless Pricing
→ Run your First Model
Future-Proofing AI Deployment
As the world moves toward more personalized, specialized, and fine-tuned AI systems, Featherless is building the foundation.
We are both a serverless inference platform and an AI research lab. Our contributions to attention-alternative architectures like RWKV help us scale models other platforms can’t. We reduce inference costs for all models by at least 10 times. And we’ve built the world’s most reliable agent for everyday use, outperforming Gemini, Claude, and GPT-4o.
Together with Hugging Face, we’re making the long tail of models accessible, scalable, and production-ready.
6,700+ LLMs hosted today
100% of Hugging Face public models targeted by EOY 2026 🤗
About Featherless
Featherless is the fastest way to run reliable, open-source AI at scale. Featherless is an AI research lab and serverless platform that gives developers, researchers, and teams instant access to the world’s largest model catalog without managing infrastructure, token limits, or hidden costs. Whether you’re building prototypes, deploying applications, or scaling intelligent systems, Featherless helps you move faster with AI you can trust. Our mission is to make personalized AGI real: open, reliable, and built for everyone.