Name: gshasiri/SmolLM3-DPO-Second-Round-no-think API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: gshasiri

Model Overview

The gshasiri/SmolLM3-DPO-Second-Round-no-think is a 1 billion parameter language model developed by gshasiri. It is a fine-tuned iteration of the gshasiri/SmolLM3-SFT-Second-Round model, specifically enhanced using Direct Preference Optimization (DPO). This training methodology, detailed in the paper "Direct Preference Optimization: Your Language Model is Secretly a Reward Model," aims to align the model's outputs more closely with human preferences.

Key Capabilities

Preference-aligned Text Generation: Leverages DPO training to produce responses that are generally preferred over those from its SFT-trained predecessor.
Contextual Understanding: Benefits from a substantial 32768 token context length, allowing it to process and generate longer, more coherent texts.
TRL Framework: Developed using the TRL (Transformer Reinforcement Learning) library, indicating a focus on advanced fine-tuning techniques.

Training Details

The model's training involved the DPO method, utilizing the TRL framework (version 0.25.1) alongside Transformers (4.57.1), Pytorch (2.6.0+cu126), Datasets (4.4.1), and Tokenizers (0.22.1). This setup suggests a robust and modern training pipeline.

Use Cases

This model is well-suited for applications requiring high-quality, preference-aligned text generation, such as chatbots, content creation, and interactive AI systems where nuanced and contextually appropriate responses are crucial.

Overview

Model Overview

Key Capabilities

Training Details

Use Cases

Full Model Card (README)