Overview
Drummond-1b1-Instruct: Portuguese Chain-of-Thought Reasoning
The Drummond-1b1-Instruct is a 1.1 billion parameter model developed by Corre Social, specifically fine-tuned for instruction following and reasoning in Portuguese (PT-BR). It is based on the Tucano-1b1-Instruct architecture and is optimized for generating a "thinking process" (chain-of-thought) before delivering its final response.
Key Capabilities & Features
- Chain-of-Thought Reasoning: Explicitly trained to produce intermediate reasoning steps, enhancing transparency and potentially accuracy for complex tasks.
- Portuguese Language Focus: Optimized for PT-BR, making it suitable for applications requiring high-quality responses in this language.
- Instruction Following: Designed to accurately follow user instructions.
- Low Computational Cost: Its 1.1B parameter size and 2048-token context window allow for efficient deployment.
- Completion Only Loss: Training focused the loss calculation only on the generated response and reasoning, preventing the model from "hallucinating" instructions.
- Special Token Integration: Uses
ChatMLand athinktrigger token to structure chain-of-thought generation.
Training Details
The model was trained using Supervised Fine-Tuning (SFT) with TRL (Transformer Reinforcement Learning) on a high-quality dataset of ~1,000 examples (corre-social/s1_dataset_ptbr_1k_tokenized). Techniques like bf16 precision, 8-bit AdamW optimizer, and gradient checkpointing were used for efficient training on GPUs.
Ideal Use Cases
- Educational Tools: Explaining concepts or problem-solving steps in Portuguese.
- Customer Support: Providing detailed, reasoned answers to user queries.
- Content Generation: Creating structured content that requires logical progression.
- Any application requiring explicit reasoning in Portuguese.