DavidAU/Llama-3.3-8B-Thinking-Gemini-Flash-11000x-128k

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Jan 3, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

DavidAU/Llama-3.3-8B-Thinking-Gemini-Flash-11000x-128k is an 8 billion parameter Llama 3.3 model with an extended 8192 token context length. It has been fine-tuned using Unsloth and a Gemini-2.5-flash-11000x dataset to enhance its reasoning capabilities, allowing it to "think" like Gemini. This model is specifically optimized for complex reasoning tasks and creative content generation, with a focus on deep thought processes.

Loading preview...

Model Overview

DavidAU/Llama-3.3-8B-Thinking-Gemini-Flash-11000x-128k is an 8 billion parameter model based on the Llama 3.3 architecture, featuring an extended context window of 8192 tokens. This model has undergone specialized fine-tuning using Unsloth and a large dataset (Gemini-2.5-flash-11000x) to imbue it with enhanced reasoning and "thinking" capabilities, mimicking the thought processes of Gemini models.

Key Capabilities

  • Enhanced Reasoning: Specifically trained to "think" and reason through complex problems, similar to Gemini models.
  • Extended Context: Supports an 8192-token context length, beneficial for detailed analysis and longer generations.
  • Creative Generation: Excels at generating creative content, particularly stories and detailed explanations, by leveraging its thinking process.
  • Knowledge Update: The tuning process also updated the model's core knowledge base.

Activation of Thinking

The model's "thinking" mode can be activated automatically by certain prompts like "explain," "come up with a plan to...", "write a...", or "think about this and come up with a plan." Users can also force thinking with a specific system prompt for inner monologue generation.

Suggested Settings

Optimal performance is achieved with specific settings:

  • Temperature: 0.7
  • Repetition Penalty: 1.05
  • Top P: 0.95
  • Min P: 0.05
  • Top K: 40

For chat/roleplay, a "Smoothing_factor" of 1.5 is recommended in compatible interfaces like KoboldCpp or text-generation-webui. The model is best run with Q4KS or higher quantization to avoid reasoning issues.