Docs /Getting Started/Chat template kwargs

Chat template kwargs

Pass model-specific chat template parameters to Featherless API requests.

Overview

chat_template_kwargs is an optional request-body field for passing model-specific chat template parameters to Featherless API requests.

Most users do not need this field. It is mainly useful for models whose chat templates expose extra controls, especially reasoning or "thinking" models where you may want to enable, disable, or budget reasoning.

Use chat_template_kwargs when you want to pass options that are not part of the standard OpenAI-compatible request body.

Where It Is Accepted

chat_template_kwargs can be included in request bodies for:

POST /v1/chat/completions
POST /debug/chat-format
POST /models/{owner}/{model}/debug/chat-format

Example

Chat Template Example

{
  "model": "Qwen/Qwen3-32B",
  "messages": [
    {
      "role": "user",
      "content": "Answer briefly: what is a Bloom filter?"
    }
  ],
  "chat_template_kwargs": {
    "enable_thinking": false
  }
}

Supported Fields

Use enable_thinking to request thinking or reasoning behavior for models that support it.

Enable Thinking set to True

{
  "chat_template_kwargs": {
    "enable_thinking": true
  }
}

Set it to false to request non-thinking/chat mode when the model supports that mode:

Enable Thinking set to False

{
  "chat_template_kwargs": {
    "enable_thinking": false
  }
}

Not every model supports switching thinking on or off. If the model template does not use this option, it may have no effect.

The keys enable_thinking, thinking, and do_reasoning are synonyms for the same on/off toggle. Featherless normalizes them on every request, so you can use whichever key you prefer regardless of the model family. If you send more than one and they disagree, the disable (false) value wins.

Toggle Keys and Defaults by Model Family

The toggle keys take a boolean value (true or false). The Default column is the value applied when the key is omitted, and not every model can be toggled.

For agentic, multi-step tool-use workloads, model performance depends on setting these values correctly: keep reasoning enabled and preserved across tool calls, or tool-calling and multi-step task quality degrade.

Model family	Toggle key	Default (if omitted)	Turn on / off
Qwen3 (e.g. Qwen3-235B-A22B)	`enable_thinking`	true	`enable_thinking: false` to disable
Qwen3.5 / 3.6	`enable_thinking`	true (small variants false)	`enable_thinking: false` to disable; `preserve_thinking: true` to retain reasoning
GLM (4.7 / 5 / 5.1)	`enable_thinking`	true	`enable_thinking: false` to disable; `clear_thinking: false` to retain reasoning
Gemma 4	`enable_thinking`	false	`enable_thinking: true` to enable
DeepSeek V3.1 / V3.2 / V4	`thinking`	false	`thinking: true` to enable
Kimi K2.5 / K2.6	`thinking`	true	`thinking: false` to disable; `preserve_thinking: true` to retain reasoning
Always-on (DeepSeek R1, Kimi-K2-Thinking, MiniMax-M2, gpt-oss, Step-3.5)	—	always reasons	Not toggleable via kwargs

Sources: vLLM — reasoning outputs · GLM-4.7 card · Qwen3.6 card · Kimi-K2.6 card · DeepSeek — thinking mode

Interleaved and Preserved Thinking (Agentic Use)

For agentic, multi-step tool use these settings are not optional: reasoning models must carry their earlier thinking forward between tool calls, or tool-calling and multi-step task quality degrade. Keep thinking enabled, set the preserved-thinking kwarg where the family supports it, and resend the reasoning_content from previous turns.

Interleaved thinking

The model reasons between tool calls and after tool results, not only once at the start. It is on by default — via the server reasoning parser or chat template, not a kwarg — for the families that support it: GLM (since 4.5), Qwen3 / 3.5 / 3.6, Kimi (K2.5 / K2.6 / K2-Thinking), and MiniMax-M2. Keep prior thinking in the message history so it is not stripped between calls.

Sources: vLLM — interleaved thinking · Z.AI — GLM thinking-mode guide

Preserved thinking

Retains the full reasoning history across all turns, not just since the last user message — recommended for agent scenarios. GLM (4.7 / 5 / 5.1) uses clear_thinking: false, while Kimi K2.5 / K2.6 and Qwen3.5 / 3.6 use preserve_thinking: true. Newer GLM clears thinking by default, so clear_thinking: false is what keeps it. These only apply to models whose templates support them; for interleaved-only models such as the original Qwen3 and Kimi-K2-Thinking, resend reasoning_content instead.

Sources: GLM-4.7 card · Kimi-K2.6 card · Qwen3.6 card

Some model templates use the older name do_reasoning.

do_reasoning

{
  "chat_template_kwargs": {
    "do_reasoning": false
  }
}

It is accepted as a synonym of enable_thinking / thinking, so you can use it interchangeably.

Use thinking_budget to request a reasoning token budget for templates that support it.

thinking_budget

{
  "chat_template_kwargs": {
    "thinking_budget": 1024
  }
}

This value is only meaningful for models that support a thinking or reasoning budget.

Custom Template Variables

chat_template_kwargs can also carry model-specific template variables.

For example, if a model template supports a custom variable like date_string, you can pass it like this:

date_string

{
  "model": "meta-llama/Llama-3.1-8B-Instruct",
  "messages": [
    {
      "role": "user",
      "content": "What date is shown to the model?"
    }
  ],
  "chat_template_kwargs": {
    "date_string": "25 May 2026"
  }
}

Unknown keys are accepted, but they only affect output if the selected model's chat template actually uses them.

How Featherless Applies These Values

When Featherless applies a chat template, values are applied in this order:

1. Featherless provides default template context values.

2. Your chat_template_kwargs are added.

3. Featherless applies required system values such as generation prompt and tool settings.

Your kwargs can override default context values, but they cannot override required system values.

Preview The Rendered Prompt

You can preview how a request will be formatted with the debug endpoint:

Curl Example

curl https://api.featherless.ai/models/Qwen/Qwen3-32B/debug/chat-format \
  -H "Authorization: Bearer $FEATHERLESS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3-32B",
    "messages": [
      {
        "role": "user",
        "content": "Answer briefly: what is a Bloom filter?"
      }
    ],
    "chat_template_kwargs": {
      "enable_thinking": false
    }
  }

The response includes:

formatted_prompt: the rendered prompt text
token_count: the prompt token count
template_info: basic formatting metadata

Notes

Most requests do not need chat_template_kwargs.
Unknown keys are allowed, but unsupported keys may be ignored.
thinking_budget only works for models whose templates support a reasoning budget.
Some reasoning models may not support disabling thinking.
For privacy, Featherless records only the names of chat_template_kwargs keys for operational visibility, not their values.

Last edited: Jul 16, 2026