Name: erbacher/Llama-3.2-Tulu-3-1B-SFT API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: erbacher

Model Overview

The erbacher/Llama-3.2-Tulu-3-1B-SFT is a 1 billion parameter language model built upon Meta's Llama-3.2-1B architecture. It has been fully fine-tuned using the comprehensive Tulu-3 dataset provided by AllenAI, specifically the allenai/tulu-3-sft-mixture.

Key Capabilities

Instruction Following: Enhanced ability to understand and execute instructions due to fine-tuning on the Tulu-3 dataset.
Llama-3.2 Architecture: Benefits from the foundational capabilities and efficiency of the Llama-3.2 base model.
Compact Size: At 1 billion parameters, it offers a balance between performance and computational efficiency, suitable for deployment in resource-constrained environments.

Training Details

The model was trained on 4x NVIDIA A100 80GB GPUs. The training configuration included a learning rate of 1.0e-5, a linear LR scheduler, and 2 epochs of training with a per-device batch size of 8 and gradient accumulation steps of 2. Gradient checkpointing was utilized to optimize memory usage during training.

Overview

Model Overview

Key Capabilities

Training Details

Full Model Card (README)