corre-social/Drummond-1b1-Instruct

Cold
Public
1.1B
BF16
2048
License: apache-2.0
Hugging Face
Overview

Drummond-1b1-Instruct: Portuguese Chain-of-Thought Reasoning

The Drummond-1b1-Instruct is a 1.1 billion parameter model developed by Corre Social, specifically fine-tuned for instruction following and reasoning in Portuguese (PT-BR). It is based on the Tucano-1b1-Instruct architecture and is optimized for generating a "thinking process" (chain-of-thought) before delivering its final response.

Key Capabilities & Features

  • Chain-of-Thought Reasoning: Explicitly trained to produce intermediate reasoning steps, enhancing transparency and potentially accuracy for complex tasks.
  • Portuguese Language Focus: Optimized for PT-BR, making it suitable for applications requiring high-quality responses in this language.
  • Instruction Following: Designed to accurately follow user instructions.
  • Low Computational Cost: Its 1.1B parameter size and 2048-token context window allow for efficient deployment.
  • Completion Only Loss: Training focused the loss calculation only on the generated response and reasoning, preventing the model from "hallucinating" instructions.
  • Special Token Integration: Uses ChatML and a think trigger token to structure chain-of-thought generation.

Training Details

The model was trained using Supervised Fine-Tuning (SFT) with TRL (Transformer Reinforcement Learning) on a high-quality dataset of ~1,000 examples (corre-social/s1_dataset_ptbr_1k_tokenized). Techniques like bf16 precision, 8-bit AdamW optimizer, and gradient checkpointing were used for efficient training on GPUs.

Ideal Use Cases

  • Educational Tools: Explaining concepts or problem-solving steps in Portuguese.
  • Customer Support: Providing detailed, reasoned answers to user queries.
  • Content Generation: Creating structured content that requires logical progression.
  • Any application requiring explicit reasoning in Portuguese.