Model Overview
sampluralis/llama-sft-muon is a fine-tuned language model derived from the gshasiri/SmolLM3-Mid base model. Developed by sampluralis, this iteration has undergone Supervised Fine-Tuning (SFT) utilizing the Hugging Face TRL (Transformers Reinforcement Learning) library. The training process aimed to enhance its text generation capabilities, making it suitable for various conversational and generative AI applications.
Key Capabilities
- General Text Generation: Capable of generating coherent and contextually relevant text based on user prompts.
- Fine-tuned Performance: Benefits from SFT on its base model, suggesting improved response quality and adherence to instructions.
- Ease of Use: Provides a straightforward integration with the Hugging Face
pipeline for quick deployment in text generation tasks.
Training Details
The model was trained using SFT, leveraging specific versions of key frameworks:
- TRL: 0.28.0
- Transformers: 4.57.6
- Pytorch: 2.6.0+cu126
- Datasets: 4.6.0
- Tokenizers: 0.22.2
This fine-tuning approach, combined with the specified framework versions, indicates a focus on stable and reproducible training outcomes. Users can visualize the training process via Weights & Biases, linked in the original repository.