erbacher/Llama-3.2-Tulu-3-1B-SFT

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kPublished:Dec 23, 2024Architecture:Transformer Warm

The erbacher/Llama-3.2-Tulu-3-1B-SFT model is a 1 billion parameter Llama architecture, specifically Llama-3.2-1B, fine-tuned by erbacher. It is designed for instruction-following tasks, leveraging the Tulu-3 dataset from AllenAI. This model specializes in conversational AI and general instruction adherence, offering a compact solution for applications requiring robust response generation.

Loading preview...

Model Overview

The erbacher/Llama-3.2-Tulu-3-1B-SFT is a 1 billion parameter language model built upon Meta's Llama-3.2-1B architecture. It has been fully fine-tuned using the comprehensive Tulu-3 dataset provided by AllenAI, specifically the allenai/tulu-3-sft-mixture.

Key Capabilities

  • Instruction Following: Enhanced ability to understand and execute instructions due to fine-tuning on the Tulu-3 dataset.
  • Llama-3.2 Architecture: Benefits from the foundational capabilities and efficiency of the Llama-3.2 base model.
  • Compact Size: At 1 billion parameters, it offers a balance between performance and computational efficiency, suitable for deployment in resource-constrained environments.

Training Details

The model was trained on 4x NVIDIA A100 80GB GPUs. The training configuration included a learning rate of 1.0e-5, a linear LR scheduler, and 2 epochs of training with a per-device batch size of 8 and gradient accumulation steps of 2. Gradient checkpointing was utilized to optimize memory usage during training.