gshasiri/SmolLM3-DPO-Second-Round
TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kPublished:Nov 27, 2025Architecture:Transformer Warm

gshasiri/SmolLM3-DPO-Second-Round is a 1 billion parameter language model fine-tuned by gshasiri using Direct Preference Optimization (DPO). This model is a DPO-tuned version of gshasiri/SmolLM3-SFT-Second-Round, designed to align with human preferences. With a context length of 32768 tokens, it is suitable for general text generation tasks where preference alignment is beneficial.

Loading preview...