MiniThinky 1B Overview
MiniThinky 1B is a 1 billion parameter model based on the Llama 3.2 architecture, developed by ngxson. This model represents an initial fine-tuning effort focused on integrating a distinct reasoning capability into a smaller language model. It operates with a substantial context length of 32768 tokens, allowing for more extensive input processing.
Key Capabilities & Features
- Explicit Reasoning Process: The model is designed to output a "thinking process" (
<|thinking|>) before delivering its final answer (<|answer|>), providing insight into its internal deliberation. - Llama 3 Chat Template: Utilizes the standard Llama 3 chat template for interaction, ensuring compatibility with existing Llama 3-based pipelines.
- System Message Sensitivity: Requires a precise system message at the beginning of conversations to activate its intended reasoning behavior. The recommended prompt is:
You are MiniThinky, a helpful AI assistant. You always think before giving the answer. Use <|thinking|> before thinking and <|answer|> before giving the answer.
Important Considerations
- Experimental Nature: This model is described as an initial trial to add reasoning, indicating its developmental stage.
- Performance: While trained for 5 hours on 4xL40S GPUs with an evaluation loss of approximately 0.8, specific benchmarks for reasoning capabilities are not yet available.
- Limitations: The model currently does not excel at simple counting tasks (e.g., counting letters in a word).
- Newer Version Available: Users are directed to a newer checkpoint, MiniThinky-v2-1B-Llama-3.2, for potentially improved performance.
Good For
- Developers interested in experimenting with small models that exhibit an explicit reasoning step.
- Use cases where understanding the model's thought process is as important as the final answer.
- Further research and fine-tuning efforts to enhance reasoning in compact LLMs.