Name: DavidAU/Llama-3.3-8B-Thinking-Gemini-Flash-11000x-128k API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: DavidAU

Model Overview

DavidAU/Llama-3.3-8B-Thinking-Gemini-Flash-11000x-128k is an 8 billion parameter model based on the Llama 3.3 architecture, featuring an extended context window of 8192 tokens. This model has undergone specialized fine-tuning using Unsloth and a large dataset (Gemini-2.5-flash-11000x) to imbue it with enhanced reasoning and "thinking" capabilities, mimicking the thought processes of Gemini models.

Key Capabilities

Enhanced Reasoning: Specifically trained to "think" and reason through complex problems, similar to Gemini models.
Extended Context: Supports an 8192-token context length, beneficial for detailed analysis and longer generations.
Creative Generation: Excels at generating creative content, particularly stories and detailed explanations, by leveraging its thinking process.
Knowledge Update: The tuning process also updated the model's core knowledge base.

Activation of Thinking

The model's "thinking" mode can be activated automatically by certain prompts like "explain," "come up with a plan to...", "write a...", or "think about this and come up with a plan." Users can also force thinking with a specific system prompt for inner monologue generation.

Suggested Settings

Optimal performance is achieved with specific settings:

Temperature: 0.7
Repetition Penalty: 1.05
Top P: 0.95
Min P: 0.05
Top K: 40

For chat/roleplay, a "Smoothing_factor" of 1.5 is recommended in compatible interfaces like KoboldCpp or text-generation-webui. The model is best run with Q4KS or higher quantization to avoid reasoning issues.

Overview

Model Overview

Key Capabilities

Activation of Thinking

Suggested Settings

Full Model Card (README)