gshasiri/SmolLM3-SFT
SmolLM3-SFT by gshasiri is a 1 billion parameter instruction-tuned causal language model, fine-tuned from gshasiri/SmolLM3-Mid using the TRL framework. This model is optimized for conversational AI and instruction following, leveraging its compact size for efficient deployment. With a 32768 token context length, it is suitable for applications requiring processing of longer prompts and generating coherent, extended responses.
Loading preview...
Overview
SmolLM3-SFT is a 1 billion parameter instruction-tuned language model developed by gshasiri. It is a fine-tuned variant of the gshasiri/SmolLM3-Mid base model, specifically trained using the TRL (Transformer Reinforcement Learning) framework to enhance its instruction-following capabilities.
Key Capabilities
- Instruction Following: Optimized for understanding and responding to user instructions and prompts.
- Conversational AI: Designed to generate coherent and contextually relevant text in dialogue-based scenarios.
- Extended Context: Supports a substantial context length of 32768 tokens, allowing for processing and generating longer sequences of text.
Training Details
The model underwent Supervised Fine-Tuning (SFT) using the TRL library. This process involved training on specific datasets to align its outputs with human instructions and preferences. The training utilized TRL: 0.25.0, Transformers: 4.57.1, and Pytorch: 2.6.0+cu126.