Hugging Face

Using Hugging Face Inference with Featherless.ai

Overview

This documentation offers a straightforward guide for developers looking to integrate and utilize Featherless’s inference capabilities via the Hugging Face ecosystem.

Authentication & Billing

When trying to access Featherless through Hugging Face, you have two authentication methods available:

Option 1: Direct API Requests: Configure your Featherless API key in your Hugging Face account settings. With this approach, inference requests are transmitted directly to our servers, you’ll make us of the subscription you have on your Featherless account.
Option 2: Hugging Face Routing: If no Featherless API key is configured, requests are automatically routed through Hugging Face’s infrastructure. In this scenario, you’ll authenticate using your Hugging Face token. All billing for routed requests will apply to your Hugging Face account at standard provider API rates. This option does not require a Featherless account as your Hugging Face account will be sufficient.

Setting up your API key

Navigate to your Hugging Face user account settings
Scroll to find the “Inference Providers” section
Add your API keys for your preferred providers, including Featherless.ai
You can also prioritize your preferred provider order, which affects the display sequence in model widgets and code snippets

Note, you can browse all models in our model catalog

Model availability and registration

We automatically synchronizes its “warm” (ready for inference) models with Hugging Face through an hourly process:

Over 5,000 models are available through this integration currently
When models become warm on Featherless they are automatically registered on Hugging Face (if they have the required parameters)
Models that are no longer warm are automatically removed from the Hugging Face listings

If you find a model that isn’t available through Hugging Face but should be, it may be missing the required pipeline_tag parameter. Models require to be tagged as transformers as library_name: transformers. The model creator will have to add this parameter to the model’s configuration or accept a PR by someone who submits it. Once correctly tagged as transformers. The Hub will infer the pipeline_tag and other tags automatically.

Ways to access Featherless models

Once the integration is complete, you will be able to access Featherless models through multiple Hugging Face interfaces:

Via the model page: Navigate to a model page → Deploy → Select Featherless.ai
Through the Hugging Face Chat/Playground Interface
Using the Hugging Face inference endpoints
Directly through code using the examples below

Examples

The examples below illustrate how to interact with various models using python and javascript

First, ensure you have installed the latest huggingface_hub library:

pip install huggingface_hub>=0.33.0

1. Text Generation with LLMs

Chat Completion using Hugging Face Hub library

from huggingface_hub import InferenceClient

# Initialize the InferenceClient with featherless.ai as the provider
client = InferenceClient(
    provider="featherless-ai",
    api_key="xxxxxxxxxxxxxxxxxxxxxxxx"# Replace with your API key (HF or custom)
)

# Define the chat messages
messages = [
    {
        "role": "user",
        "content": "What is the capital of France?"
    }
]

# Generate a chat completion
completion = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1-0528",
    messages=messages,
    max_tokens=500
)

# Print the response
print(completion.choices[0].message)

You can substitute any compatible LLM - check our catalog for a complete list of available models.

You can also access Featherless as an inference provider through the OpenAI python client. Simply specify the base_url and model parameters in the client configuration and API call respectively.

For the easiest implementation, visit any model's page on the hub and copy the ready-made code snippet.

Using OpenAI client library

from openai import OpenAI

client = OpenAI(
    base_url="https://router.huggingface.co/featherless-ai",
    api_key="hf_xxxxxxxxxxxxxxxxxxxxxxxx"#featherless.ai or Hugging Face api key
)

messages = [
    {
        "role": "user",
        "content": "What is the capital of France?"
    }
]

completion = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1-0528",
    messages=messages,
    max_tokens=500,
)

print(completion.choices[0].message)

Technical Notes

Model stability: Some models may exhibit variation in behavior when accessed through the integration
Parameter control: The Hugging Face playground interface may not expose all possible model parameters
Model synchronization: The integration uses an automated system that updates available models hourly, so newly warmed models may take up to an hour to appear in the Hugging Face interface

Last edited: Jun 12, 2025