Overview
Overview
SmolLM3-SFT is a 1 billion parameter instruction-tuned language model developed by gshasiri. It is a fine-tuned variant of the gshasiri/SmolLM3-Mid base model, specifically trained using the TRL (Transformer Reinforcement Learning) framework to enhance its instruction-following capabilities.
Key Capabilities
- Instruction Following: Optimized for understanding and responding to user instructions and prompts.
- Conversational AI: Designed to generate coherent and contextually relevant text in dialogue-based scenarios.
- Extended Context: Supports a substantial context length of 32768 tokens, allowing for processing and generating longer sequences of text.
Training Details
The model underwent Supervised Fine-Tuning (SFT) using the TRL library. This process involved training on specific datasets to align its outputs with human instructions and preferences. The training utilized TRL: 0.25.0, Transformers: 4.57.1, and Pytorch: 2.6.0+cu126.