Chat

The chat endpoint generates a completion for the provided chat conversation. The tokens generated from the chat completion count toward the token limit.

HTTP request

POST https://api.featherless.ai/v1/chat/completions

Request body

{
  "model": "string",
  "messages": [
    {
      "role": "system",
      "content": "string"
    },
    {
      "role": "user",
      "content": "string"
    }
  ],
  "presence_penalty": "float",
  "frequency_penalty": "float",
  "repetition_penalty": "float",
  "temperature": "float",
  "top_p": "float",
  "top_k": "integer",
  "min_p": "float",
  "seed": "integer",
  "stop": ["string"],
  "stop_token_ids": ["integer"],
  "include_stop_str_in_output": "boolean",
  "max_tokens": "integer",
  "min_tokens": "integer"
}

Parameters

ParameterTypeDescription
modelstringID of the model to use. Specify the model to use for generating chat completions.
messagesarrayA list of messages comprising the conversation so far.
presence_penaltyfloatPenalizes new tokens based on their presence in the generated text so far. Values > 0 encourage new tokens; < 0 encourages repetition.
frequency_penaltyfloatPenalizes new tokens based on their frequency in the generated text so far. Values > 0 encourage new tokens; < 0 encourages repetition.
repetition_penaltyfloatPenalizes new tokens based on their appearance in the prompt and generated text. Values > 1 encourage new tokens; < 1 encourages repetition.
temperaturefloatControls sampling randomness. Lower values make the model more deterministic; higher values introduce randomness. Zero is greedy sampling.
top_pfloatControls the cumulative probability of considered top tokens. Must be in (0, 1]. Set to 1 to consider all tokens.
top_kintegerNumber of top tokens to consider. Set to -1 to consider all tokens.
min_pfloatMinimum probability for a token to be considered relative to the most likely token. Must be in 0, 1. Set to 0 to disable.
seedintegerRandom seed for generation. (Not reliable, as we use multiple servers).
stoparrayList of strings that stop generation when generated. The returned output excludes these strings.
stop_token_idsarrayList of tokens that stop generation when generated. The returned output may include these tokens unless they are special tokens.
include_stop_str_in_outputbooleanBoolean to include stop strings in output text. Defaults to False.
max_tokensintegerMaximum number of tokens generated per output sequence.
min_tokensintegerMinimum number of tokens generated per output sequence before EOS or stop_token_ids can be generated.

Response body

If successful, the response body will contain data with the following structure:

{
  "id": "string",
  "object": "chat.completion",
  "created": "integer",
  "model": "string",
  "choices": [
    {
      "index": "integer",
      "message": {
        "role": "string",
        "content": "string"
      },
      "finish_reason": "string"
    }
  ],
  "usage": {
    "prompt_tokens": "integer",
    "completion_tokens": "integer",
    "total_tokens": "integer"
  }
}

Example request

curl https://api.featherless.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $FEATHERLESS_API_KEY" \
  -d '{
    "model": "GalrionSoftworks/Margnum-12B-v1",
    "messages": [{"role": "user", "content": "Hello!"}],
    "presence_penalty": 0.5,
    "frequency_penalty": 0.5,
    "temperature": 0.7,
    "top_p": 0.9,
    "max_tokens": 100
  }'

Example: Passing Application Headers

When making API requests, you can include custom headers to help identify and monitor your application's usage. These headers provide additional context about the requests, which can be useful for analytics, usage tracking, and debugging.

HTTP Request Example

fetch("https://api.featherless.ai/v1/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${FEATHERLESS_API_KEY}`,
    "HTTP-Referer": `${YOUR_SITE_URL}`, // Optional: Include your app's URL for tracking
    "X-Title": `${YOUR_SITE_NAME}`,     // Optional: Identify your app in API analytics
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    "model": "GalrionSoftworks/Margnum-12B-v1",
    "messages": [
      {"role": "user", "content": "What is the meaning of life?"},
    ],
  })
});

Explanation

  • Authorization: This header contains your API key, which is required to authenticate your requests.
  • HTTP-Referer: The HTTP-Referer header is optional but recommended if you want to track where the API requests are originating from. By including your site's URL, you can gain insights into which applications or services are interacting with the API.
  • X-Title: The X-Title header is also optional and is used to specify the name of your application. This can help in identifying your application in API usage reports, making it easier to monitor and analyze your application's interaction with the API.

By passing these headers, you enhance the visibility of your application's usage within the API's analytics, which can aid in tracking performance, usage trends, and potential issues.

Example Use Case

Consider an application named "Featherless Chat" that interacts with the API to generate chat responses. By including the HTTP-Referer and X-Title headers in each API request, the developers can easily monitor how often the application is used and identify it in the API's analytics dashboard.

fetch("https://api.featherless.ai/v1/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${FEATHERLESS_API_KEY}`,
    "HTTP-Referer": "https://featherlesschat.com", // Track the application URL
    "X-Title": "Featherless Chat",                 // Identify the application by name
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    "model": "GalrionSoftworks/Margnum-12B-v1",
    "messages": [
      {"role": "user", "content": "How is the weather today?"},
    ],
  })
});

Completions

Endpoint

POST https://api.featherless.ai/v1/completions

This endpoint generates a completion for the provided prompt and parameters using the Featherless AI model.

Request Body

Parameters

ParameterTypeDescription
modelstringRequired. ID of the model to use.
promptstringRequired. The prompt(s) to generate completions for, encoded as a string.
presence_penaltyfloatOptional. Penalizes new tokens based on their presence in the generated text so far. Values > 0 encourage new tokens; < 0 encourages repetition.
frequency_penaltyfloatOptional. Penalizes new tokens based on their frequency in the generated text. Values > 0 encourage new tokens; < 0 encourages repetition.
repetition_penaltyfloatOptional. Penalizes new tokens based on their appearance in the prompt and generated text. Values > 1 encourage new tokens; < 1 encourages repetition.
temperaturefloatOptional. Controls the randomness of sampling. Lower values make the output more deterministic; higher values add more randomness. Zero is greedy sampling.
top_pfloatOptional. Controls the cumulative probability of the most likely tokens. Must be between 0 and 1. Setting this to 1 considers all tokens.
top_kintegerOptional. The number of top tokens to consider in the sampling process. Set to -1 to consider all tokens.
min_pfloatOptional. Sets a minimum probability threshold relative to the most likely token for a token to be considered. Must be between 0 and 1. Set to 0 to disable.
seedintegerOptional. Sets a random seed for generation. Not always reliable as multiple servers may be used.
stoparrayOptional. A list of strings that, when encountered in the generated output, will stop further generation. The returned output excludes these strings.
stop_token_idsarrayOptional. Similar to stop, but uses token IDs to halt generation. The output might include these tokens unless they are special tokens.
include_stop_str_in_outputbooleanOptional. If set to true, includes stop strings in the output text. Defaults to false.
max_tokensintegerOptional. The maximum number of tokens to generate in the completion.
min_tokensintegerOptional. The minimum number of tokens to generate before EOS or stop_token_ids can be generated.

Example Request

{
  "model": "GalrionSoftworks/Margnum-12B-v1",
  "prompt": "Once upon a time",
  "temperature": 0.7,
  "max_tokens": 150,
  "top_p": 0.9,
  "frequency_penalty": 0.5,
  "presence_penalty": 0.0
}

Example Response

{
  "id": "cmpl-6YgK3ASw92kT14L5f8zJQ7yY",
  "object": "text_completion",
  "created": 1630569482,
  "model": "GalrionSoftworks/Margnum-12B-v1",
  "choices": [
    {
      "text": " in a land far, far away, there lived a wise old owl.",
      "index": 0,
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 5,
    "completion_tokens": 15,
    "total_tokens": 20
  }
}

Usage Notes

  • Presence Penalty: Encourages the model to generate new tokens instead of repeating the same ones, ideal for creative writing.
  • Frequency Penalty: Helps balance repetition, useful in avoiding the overuse of specific words or phrases.
  • Temperature & Top-p: Together control the randomness and diversity of the output, crucial for customizing the creativity of the completion.

This API structure provides flexibility in generating text by allowing fine-tuning with various sampling parameters, making it adaptable for different use cases.