sajjadiba/TinyLlama-1.1B-Chat-SFT-ultrachat3k-DPO-argilla6k

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.1BQuant:BF16Ctx Length:2kPublished:May 13, 2026Architecture:Transformer0.0K Warm

sajjadiba/TinyLlama-1.1B-Chat-SFT-ultrachat3k-DPO-argilla6k is a 1.1 billion parameter TinyLlama-based causal language model fine-tuned for chat. This model leverages Supervised Fine-Tuning (SFT) on 3,000 samples from ultrachat_200k and further optimizes conversational quality through Direct Preference Optimization (DPO) on 6,000 samples from argilla/distilabel-intel-orca-dpo-pairs. It is designed for research into small-scale conversational AI, offering a compact solution for chat-based applications with a 2048 token context length.

Loading preview...

Model Overview

This model, sajjadiba/TinyLlama-1.1B-Chat-SFT-ultrachat3k-DPO-argilla6k, is a compact 1.1 billion parameter language model built upon the TinyLlama architecture. It has been specifically fine-tuned for conversational applications, making it suitable for research and development in small-scale chat systems.

Training Methodology

The model's conversational capabilities are developed through a two-stage fine-tuning process:

  • Supervised Fine-Tuning (SFT): Initial training was performed on 3,000 samples sourced from the ultrachat_200k dataset, establishing foundational chat behaviors.
  • Direct Preference Optimization (DPO): Further refinement was achieved using 6,000 samples from the argilla/distilabel-intel-orca-dpo-pairs dataset, enhancing response quality and alignment with preferred conversational styles.

Key Characteristics

  • Base Model: Fine-tuned from TinyLlama/TinyLlama-1.1B-Chat-v1.0.
  • Parameter Count: 1.1 billion parameters, offering a lightweight solution.
  • Context Length: Supports a context window of 2048 tokens.

Important Considerations

WARNING: This model is a research artifact and lacks safety guardrails. It may generate harmful, dangerous, or inappropriate content. It is not recommended for production use or public deployment without implementing robust safety measures such as content filtering and supervised moderation.