Name: gshasiri/SmolLM3-DPO-Second-Round API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: gshasiri

Model Overview

gshasiri/SmolLM3-DPO-Second-Round is a 1 billion parameter language model developed by gshasiri. It is a fine-tuned iteration of the gshasiri/SmolLM3-SFT-Second-Round model, specifically enhanced using Direct Preference Optimization (DPO). This training methodology aims to align the model's outputs more closely with human preferences, making its responses potentially more desirable or helpful.

Key Training Details

Base Model: Fine-tuned from gshasiri/SmolLM3-SFT-Second-Round.
Optimization Method: Utilizes Direct Preference Optimization (DPO), a technique introduced in the paper "Direct Preference Optimization: Your Language Model is Secretly a Reward Model" (paper link).
Framework: Trained using the TRL library (Transformer Reinforcement Learning).
Context Length: Supports a context window of 32768 tokens.

Potential Use Cases

This model is well-suited for applications requiring:

General Text Generation: Producing coherent and contextually relevant text.
Preference-Aligned Responses: Generating outputs that are more aligned with human preferences due to DPO training.
Interactive AI Systems: Where the quality and desirability of generated responses are important.

Overview

Model Overview

Key Training Details

Potential Use Cases

Full Model Card (README)