Model Overview
DavidAU/Llama-3.3-8B-Thinking-Gemini-Flash-11000x-128k is an 8 billion parameter model based on the Llama 3.3 architecture, featuring an extended context window of 8192 tokens. This model has undergone specialized fine-tuning using Unsloth and a large dataset (Gemini-2.5-flash-11000x) to imbue it with enhanced reasoning and "thinking" capabilities, mimicking the thought processes of Gemini models.
Key Capabilities
- Enhanced Reasoning: Specifically trained to "think" and reason through complex problems, similar to Gemini models.
- Extended Context: Supports an 8192-token context length, beneficial for detailed analysis and longer generations.
- Creative Generation: Excels at generating creative content, particularly stories and detailed explanations, by leveraging its thinking process.
- Knowledge Update: The tuning process also updated the model's core knowledge base.
Activation of Thinking
The model's "thinking" mode can be activated automatically by certain prompts like "explain," "come up with a plan to...", "write a...", or "think about this and come up with a plan." Users can also force thinking with a specific system prompt for inner monologue generation.
Suggested Settings
Optimal performance is achieved with specific settings:
- Temperature: 0.7
- Repetition Penalty: 1.05
- Top P: 0.95
- Min P: 0.05
- Top K: 40
For chat/roleplay, a "Smoothing_factor" of 1.5 is recommended in compatible interfaces like KoboldCpp or text-generation-webui. The model is best run with Q4KS or higher quantization to avoid reasoning issues.