Chat
The chat endpoint generates a completion for the provided chat conversation. The tokens generated from the chat completion count toward the token limit.
HTTP request
POST https://api.featherless.ai/v1/chat/completions
Request body
{
"model": "string",
"messages": [
{
"role": "system",
"content": "string"
},
{
"role": "user",
"content": "string"
}
],
"presence_penalty": "float",
"frequency_penalty": "float",
"repetition_penalty": "float",
"temperature": "float",
"top_p": "float",
"top_k": "integer",
"min_p": "float",
"seed": "integer",
"stop": ["string"],
"stop_token_ids": ["integer"],
"include_stop_str_in_output": "boolean",
"max_tokens": "integer",
"min_tokens": "integer"
}
Parameters
Parameter | Type | Description |
---|---|---|
model | string | ID of the model to use. Specify the model to use for generating chat completions. |
messages | array | A list of messages comprising the conversation so far. |
presence_penalty | float | Penalizes new tokens based on their presence in the generated text so far. Values > 0 encourage new tokens; < 0 encourages repetition. |
frequency_penalty | float | Penalizes new tokens based on their frequency in the generated text so far. Values > 0 encourage new tokens; < 0 encourages repetition. |
repetition_penalty | float | Penalizes new tokens based on their appearance in the prompt and generated text. Values > 1 encourage new tokens; < 1 encourages repetition. |
temperature | float | Controls sampling randomness. Lower values make the model more deterministic; higher values introduce randomness. Zero is greedy sampling. |
top_p | float | Controls the cumulative probability of considered top tokens. Must be in (0, 1]. Set to 1 to consider all tokens. |
top_k | integer | Number of top tokens to consider. Set to -1 to consider all tokens. |
min_p | float | Minimum probability for a token to be considered relative to the most likely token. Must be in 0, 1. Set to 0 to disable. |
seed | integer | Random seed for generation. (Not reliable, as we use multiple servers). |
stop | array | List of strings that stop generation when generated. The returned output excludes these strings. |
stop_token_ids | array | List of tokens that stop generation when generated. The returned output may include these tokens unless they are special tokens. |
include_stop_str_in_output | boolean | Boolean to include stop strings in output text. Defaults to False. |
max_tokens | integer | Maximum number of tokens generated per output sequence. |
min_tokens | integer | Minimum number of tokens generated per output sequence before EOS or stop_token_ids can be generated. |
Response body
If successful, the response body will contain data with the following structure:
{
"id": "string",
"object": "chat.completion",
"created": "integer",
"model": "string",
"choices": [
{
"index": "integer",
"message": {
"role": "string",
"content": "string"
},
"finish_reason": "string"
}
],
"usage": {
"prompt_tokens": "integer",
"completion_tokens": "integer",
"total_tokens": "integer"
}
}
Example request
curl https://api.featherless.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $FEATHERLESS_API_KEY" \
-d '{
"model": "GalrionSoftworks/Margnum-12B-v1",
"messages": [{"role": "user", "content": "Hello!"}],
"presence_penalty": 0.5,
"frequency_penalty": 0.5,
"temperature": 0.7,
"top_p": 0.9,
"max_tokens": 100
}'
Example: Passing Application Headers
When making API requests, you can include custom headers to help identify and monitor your application's usage. These headers provide additional context about the requests, which can be useful for analytics, usage tracking, and debugging.
HTTP Request Example
fetch("https://api.featherless.ai/v1/chat/completions", {
method: "POST",
headers: {
"Authorization": `Bearer ${FEATHERLESS_API_KEY}`,
"HTTP-Referer": `${YOUR_SITE_URL}`, // Optional: Include your app's URL for tracking
"X-Title": `${YOUR_SITE_NAME}`, // Optional: Identify your app in API analytics
"Content-Type": "application/json"
},
body: JSON.stringify({
"model": "GalrionSoftworks/Margnum-12B-v1",
"messages": [
{"role": "user", "content": "What is the meaning of life?"},
],
})
});
Explanation
- Authorization: This header contains your API key, which is required to authenticate your requests.
- HTTP-Referer: The
HTTP-Referer
header is optional but recommended if you want to track where the API requests are originating from. By including your site's URL, you can gain insights into which applications or services are interacting with the API. - X-Title: The
X-Title
header is also optional and is used to specify the name of your application. This can help in identifying your application in API usage reports, making it easier to monitor and analyze your application's interaction with the API.
By passing these headers, you enhance the visibility of your application's usage within the API's analytics, which can aid in tracking performance, usage trends, and potential issues.
Example Use Case
Consider an application named "Featherless Chat" that interacts with the API to generate chat responses. By including the HTTP-Referer
and X-Title
headers in each API request, the developers can easily monitor how often the application is used and identify it in the API's analytics dashboard.
fetch("https://api.featherless.ai/v1/chat/completions", {
method: "POST",
headers: {
"Authorization": `Bearer ${FEATHERLESS_API_KEY}`,
"HTTP-Referer": "https://featherlesschat.com", // Track the application URL
"X-Title": "Featherless Chat", // Identify the application by name
"Content-Type": "application/json"
},
body: JSON.stringify({
"model": "GalrionSoftworks/Margnum-12B-v1",
"messages": [
{"role": "user", "content": "How is the weather today?"},
],
})
});
Completions
Endpoint
POST https://api.featherless.ai/v1/completions
This endpoint generates a completion for the provided prompt and parameters using the Featherless AI model.
Request Body
Parameters
Parameter | Type | Description |
---|---|---|
model | string | Required. ID of the model to use. |
prompt | string | Required. The prompt(s) to generate completions for, encoded as a string. |
presence_penalty | float | Optional. Penalizes new tokens based on their presence in the generated text so far. Values > 0 encourage new tokens; < 0 encourages repetition. |
frequency_penalty | float | Optional. Penalizes new tokens based on their frequency in the generated text. Values > 0 encourage new tokens; < 0 encourages repetition. |
repetition_penalty | float | Optional. Penalizes new tokens based on their appearance in the prompt and generated text. Values > 1 encourage new tokens; < 1 encourages repetition. |
temperature | float | Optional. Controls the randomness of sampling. Lower values make the output more deterministic; higher values add more randomness. Zero is greedy sampling. |
top_p | float | Optional. Controls the cumulative probability of the most likely tokens. Must be between 0 and 1. Setting this to 1 considers all tokens. |
top_k | integer | Optional. The number of top tokens to consider in the sampling process. Set to -1 to consider all tokens. |
min_p | float | Optional. Sets a minimum probability threshold relative to the most likely token for a token to be considered. Must be between 0 and 1. Set to 0 to disable. |
seed | integer | Optional. Sets a random seed for generation. Not always reliable as multiple servers may be used. |
stop | array | Optional. A list of strings that, when encountered in the generated output, will stop further generation. The returned output excludes these strings. |
stop_token_ids | array | Optional. Similar to stop , but uses token IDs to halt generation. The output might include these tokens unless they are special tokens. |
include_stop_str_in_output | boolean | Optional. If set to true, includes stop strings in the output text. Defaults to false. |
max_tokens | integer | Optional. The maximum number of tokens to generate in the completion. |
min_tokens | integer | Optional. The minimum number of tokens to generate before EOS or stop_token_ids can be generated. |
Example Request
{
"model": "GalrionSoftworks/Margnum-12B-v1",
"prompt": "Once upon a time",
"temperature": 0.7,
"max_tokens": 150,
"top_p": 0.9,
"frequency_penalty": 0.5,
"presence_penalty": 0.0
}
Example Response
{
"id": "cmpl-6YgK3ASw92kT14L5f8zJQ7yY",
"object": "text_completion",
"created": 1630569482,
"model": "GalrionSoftworks/Margnum-12B-v1",
"choices": [
{
"text": " in a land far, far away, there lived a wise old owl.",
"index": 0,
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 5,
"completion_tokens": 15,
"total_tokens": 20
}
}
Usage Notes
- Presence Penalty: Encourages the model to generate new tokens instead of repeating the same ones, ideal for creative writing.
- Frequency Penalty: Helps balance repetition, useful in avoiding the overuse of specific words or phrases.
- Temperature & Top-p: Together control the randomness and diversity of the output, crucial for customizing the creativity of the completion.
This API structure provides flexibility in generating text by allowing fine-tuning with various sampling parameters, making it adaptable for different use cases.