sajjadiba/TinyLlama-1.1B-Chat-SFT-ultrachat3k-DPO-argilla6k
sajjadiba/TinyLlama-1.1B-Chat-SFT-ultrachat3k-DPO-argilla6k is a 1.1 billion parameter TinyLlama-based causal language model fine-tuned for chat. This model leverages Supervised Fine-Tuning (SFT) on 3,000 samples from ultrachat_200k and further optimizes conversational quality through Direct Preference Optimization (DPO) on 6,000 samples from argilla/distilabel-intel-orca-dpo-pairs. It is designed for research into small-scale conversational AI, offering a compact solution for chat-based applications with a 2048 token context length.
Loading preview...
Model Overview
This model, sajjadiba/TinyLlama-1.1B-Chat-SFT-ultrachat3k-DPO-argilla6k, is a compact 1.1 billion parameter language model built upon the TinyLlama architecture. It has been specifically fine-tuned for conversational applications, making it suitable for research and development in small-scale chat systems.
Training Methodology
The model's conversational capabilities are developed through a two-stage fine-tuning process:
- Supervised Fine-Tuning (SFT): Initial training was performed on 3,000 samples sourced from the
ultrachat_200kdataset, establishing foundational chat behaviors. - Direct Preference Optimization (DPO): Further refinement was achieved using 6,000 samples from the
argilla/distilabel-intel-orca-dpo-pairsdataset, enhancing response quality and alignment with preferred conversational styles.
Key Characteristics
- Base Model: Fine-tuned from
TinyLlama/TinyLlama-1.1B-Chat-v1.0. - Parameter Count: 1.1 billion parameters, offering a lightweight solution.
- Context Length: Supports a context window of 2048 tokens.
Important Considerations
WARNING: This model is a research artifact and lacks safety guardrails. It may generate harmful, dangerous, or inappropriate content. It is not recommended for production use or public deployment without implementing robust safety measures such as content filtering and supervised moderation.