GuideMay 23, 2025

Building Production-Ready LLM Apps with LangChain & Featherless Serverless Inference

LangChain Integration for Serverless LLM Deployment with 4,300+ Open Source Models

As the open source AI ecosystem rapidly evolves, developers are faced with two growing challenges: managing infrastructure and evaluating the ever-expanding universe of models. By integrating with LangChain, Featherless now enables you to build and scale LLM-powered applications with zero infrastructure hassle and instant access to over 4,300 open source models. Following up on our previous post, "Zero to AI: Deploying Language Models without the Infrastructure Headache," we're thrilled to announce a significant leap forward: Featherless now has a native integration with LangChain! You can find us on the LangChain Python documentation.

From Prototype to Production: Why Combining LangChain + Featherless is a Game-Changer

While LangChain has pioneered how developers chain together LLM operations, the challenge of managing model infrastructure remains. With Featherless we hope to solve this piece of the puzzle:

  • Scalable Infrastructure - Deploy production-grade LLM applications without a single line of DevOps code. No GPU provisioning, no autoscaling headaches and no containers to manage.

  • Unlimited Model Flexibility - Instant access to 4,300+ (and growing everyday) open source models through a single consistent API. Swap between Mistral, Llama, DeepSeek, Qwen and thousands more by changing just one parameter.

  • Predictable Pricing - Featherless offers straightforward subscription-based pricing with no hidden costs.

  • Rapid Prototyping & Testing - Evaluate different models for your use case in minutes, not days. Experiment with model parameters and find the perfect balance of performance and cost.

The goal is for you to focus on your application logic while we handle the heavy lifting of inference infrastructure.

Quickstart: Launch your LangChain App with Featherless

Getting started is incredibly straightforward

  1. Install necessary packages:

Install necessary packages:
pip install langchain langchain-core langchain-featherless-ai

(Note: langchain-featherless-ai is the dedicated package for our native integration.)

  1. Initialize ChatFeatherlessAi as your LLM provider in LangChain:

Initializing ChatFeatherlessAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_featherless_ai import ChatFeatherlessAi

# Initialize Featherless LLM
# Best practice: Set your API key as an environment variable (FEATHERLESS_API_KEY)
# Or, you can pass it directly:
llm = ChatFeatherlessAi(
    featherless_api_key="YOUR_FEATHERLESS_API_KEY", # Replace with your actual key
    model="mistralai/Mistral-Small-24B-Instruct-2501", # Example model
    temperature=0.7,
    max_tokens=256 # Adjusted for a slogan
)

# Define a prompt template
prompt = ChatPromptTemplate.from_template(
    "What is a creative slogan for a product called {product}?"
)

# Define an output parser
output_parser = StrOutputParser()

# Construct the chain using LCEL's pipe (|) operator
chain = prompt | llm | output_parser

# Invoke the chain
product_name = "Featherless AI"
response = chain.invoke({"product": product_name})

print(f"Slogan for {product_name}: {response}")

Key Change: We are now using ChatFeatherlessAi directly from langchain_featherless_ai instead of the OpenAI-compatible endpoint. The API key can be passed directly or set via the FEATHERLESS_API_KEY environment variable.

Done! You’ve just powered your LangChain application with a model from Featherless using our direct, native integration and modern LCEL syntax.

Example Use Case: Building a RAG App with Native Featherless Integration

Let’s dig deeper on the power of this native integration by building a lightweight RAG (Retrieval-Augmented Generation) system. This is perfect for creating Q&A bots over your own documents.

We'll use:

  • ChatFeatherlessAi for LLM inference.

  • LangChain (LCEL, community packages) for orchestration and retrieval.

  • FAISS (from langchain-community) as a simple in-memory vector store.

  • HuggingFaceEmbeddings (from langchain-huggingface) for document embedding.

  1. Install additional packages for RAG:

additional packages
pip install langchain-community langchain-huggingface langchain-text-splitters faiss-cpu sentence-transformers

(Note: faiss-cpu is for CPU-based FAISS, use faiss-gpu if you have a GPU setup.)

  1. Ingest and Index Your Documents

Index documents
from langchain_community.vectorstores import FAISS
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Create a dummy "your_document.txt" file in the same directory for this example:
# File content: "The Featherless API provides access to many LLMs. It's designed for ease of use and developer productivity."
try:
    with open("your_document.txt", "w", encoding="utf-8") as f:
        f.write("The Featherless API provides access to many LLMs. It's designed for ease of use and developer productivity.")

    loader = TextLoader("./your_document.txt", encoding="utf-8")
    documents = loader.load()
except Exception as e:
    print(f"Error preparing or loading document: {e}")
    print("Please ensure you can write to 'your_document.txt' or create it manually.")
    documents = []

retriever = None
if documents:
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
    split_docs = text_splitter.split_documents(documents)

    # Using a common, reliable sentence transformer model
    embeddings_model_name = "sentence-transformers/all-mpnet-base-v2"
    embeddings = HuggingFaceEmbeddings(model_name=embeddings_model_name)

    try:
        vectorstore = FAISS.from_documents(split_docs, embeddings)
        retriever = vectorstore.as_retriever(search_kwargs={"k": 2}) 
    except Exception as e:
        print(f"Error creating FAISS vector store or retriever: {e}")
        print("This might be related to your PyTorch/Torchvision/FAISS setup.")
        print("Ensure you followed Step 1 for installing PyTorch correctly.")
else:
    print("No documents loaded, retriever will not be initialized.")
  1. Set Up ChatFeatherlessAiand Build the RAG Chain using LCEL:

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough, RunnableParallel
from langchain_featherless_ai import ChatFeatherlessAi

# Initialize Featherless LLM
llm = ChatFeatherlessAi(
    featherless_api_key="YOUR_FEATHERLESS_API_KEY", # Replace
    model="mistralai/Mistral-Small-24B-Instruct-2501",
    temperature=0.3 # Lower temperature for more factual RAG
)

# RAG Prompt Template
template = """Answer the question based only on the following context:
{context}

Question: {question}

Answer:"""
rag_prompt = ChatPromptTemplate.from_template(template)

# Helper function to format retrieved documents
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

if retriever:
    # Construct the RAG chain using LCEL
    rag_chain_from_docs = (
        RunnablePassthrough.assign(context=(lambda x: format_docs(x["documents"])))
        | rag_prompt
        | llm
        | StrOutputParser()
    )

    rag_chain_with_source = RunnableParallel(
        {"documents": retriever, "question": RunnablePassthrough()}
    ).assign(answer=rag_chain_from_docs)

    # Example Invocation
    question = "What is the Featherless API designed for?"
    result = rag_chain_with_source.invoke(question)

    print(f"\nQuestion: {question}")
    print(f"Answer: {result['answer']}")
    print("\nSources:")
    for doc in result['documents']:
        print(f"- {doc.page_content} (Metadata: {doc.metadata})")

else:
    print("RAG chain not created as retriever is unavailable.")

This example demonstrates a modern, LCEL-based approach to building a sophisticated RAG system, seamlessly powered by serverless inference from Featherless and orchestrated via LangChain.

Effortless Model Experimentation: Remember the model Parameter

Want to see if LLaMA 3 provides better answers for your use case? Or perhaps test DeepSeek's capabilities? With the native ChatFeatherlessAi integration, switching models is as simple as updating the model parameter:

model parameter
# Initialize with LLaMA 3
llm_llama3 = ChatFeatherlessAi(
    featherless_api_key="YOUR_FEATHERLESS_API_KEY",
    model="meta-llama/Llama-3.3-70B-Instruct" # Example LLaMA 3 model from Featherless
)

# Or try DeepSeek
llm_deepseek = ChatFeatherlessAi(
    featherless_api_key="YOUR_FEATHERLESS_API_KEY",
    model="deepseek-ai/DeepSeek-V3-0324" # Example DeepSeek model from Featherless
)

# Then, you can plug these into your LCEL chains:
# new_chain = prompt | llm_llama3 | output_parser
# new_rag_chain_with_source = RunnableParallel(...).assign(answer=rag_chain_from_docs.with_llm(llm_deepseek))

This frictionless model evaluation is a game-changer for prompt tuning and finding the perfect LLM for your specific task, all within a familiar LangChain paradigm.

How to Get Started with Featherless and LangChain:

  1. Create your free Featherless account: Sign up at Featherless.ai

  2. Grab your API key: Find it on your Featherless dashboard. Set it as an environment variable FEATHERLESS_API_KEY or pass it directly to ChatFeatherlessAi.

  3. Explore the Model Catalog: Discover over 4,300 models ready for instant deployment. Check the latest list here.

  4. Install langchain-featherless-ai and other necessary langchain packages, then use ChatFeatherlessAi as shown.

  5. Dive into the Docs:

Final Thoughts: Build Without Limits

The native synergy between LangChain's powerful orchestration (especially with LCEL) and Featherless's ChatFeatherlessAi component is set to redefine how developers build, test, and ship LLM-powered applications. By removing infrastructure bottlenecks and providing vast model choice through a dedicated integration, we're empowering you to focus solely on innovation. Cold starts, model hosting, and scaling headaches are now a thing of the past.

Ready to build your next groundbreaking LLM app without the usual friction? Join our Discord today to get help building your first app!

Try Featherless with LangChain's native integration today!

Featherless: Open source LLMs. One API. Zero infrastructure.