wassname/llama-3.2-3b-sft

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.2BQuant:BF16Ctx Length:32kPublished:Jun 4, 2025Architecture:Transformer0.0K Warm

The wassname/llama-3.2-3b-sft model is a fine-tuned version of tanliboy/Llama-3.2-3B, developed by wassname. This 3 billion parameter model is specifically trained using Supervised Fine-Tuning (SFT) on the wassname/ultrachat_200k_filtered dataset. It serves as an intermediate SFT model, making it suitable for subsequent DPO (Direct Preference Optimization) experiments. Its primary use case is as a foundational SFT layer for advanced alignment techniques.

Loading preview...

Model Overview

The wassname/llama-3.2-3b-sft model is a 3 billion parameter language model developed by wassname. It is a supervised fine-tuned (SFT) variant of the tanliboy/Llama-3.2-3B base model. The fine-tuning process utilized the wassname/ultrachat_200k_filtered dataset and was performed using the TRL (Transformer Reinforcement Learning) library.

Key Characteristics

  • SFT Foundation: This model is explicitly designed as an SFT layer, which is a prerequisite for advanced alignment techniques like Direct Preference Optimization (DPO).
  • Dataset: Fine-tuned on the wassname/ultrachat_200k_filtered dataset, indicating a focus on conversational or instruction-following capabilities.
  • Training Framework: Developed using Hugging Face's TRL library, known for its tools in reinforcement learning from human feedback (RLHF) and alignment.

Intended Use Cases

  • DPO Experiments: Ideal for researchers and developers looking to conduct DPO experiments, as it provides the necessary SFT baseline.
  • Intermediate Alignment Step: Serves as a crucial intermediate model in the pipeline for training more advanced, preference-aligned language models.
  • Instruction Following: Given its training on an UltraChat-derived dataset, it can be used for basic instruction-following tasks, though its primary purpose is as an SFT base.