Name: Harvard-DCML/boomerang-llama-3.2-1.9B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Harvard-DCML

Model Overview

The Harvard-DCML/boomerang-llama-3.2-1.9B is a 1.9 billion parameter student model, part of the Llama family, developed by Harvard-DCML. It is a product of "boomerang distillation," a technique that allows for the creation of intermediate-sized models by reincorporating layers from a teacher model (Llama-3.2-3B) into a student model without further training. This process enables zero-shot model size interpolation, offering flexibility in model deployment.

Training Details

This model was initialized from Llama-3.2-3B by copying specific layers and then distilled on 2.1 billion tokens from The Pile dataset. The distillation process involved matching the activations of the Llama-3.2-3B teacher model using a combination of cross-entropy, KL, and cosine loss functions. Key training hyperparameters included a learning rate of 3e-4, a cosine learning rate scheduler, and an effective batch size of 2048 over 500 training steps.

Key Capabilities and Use Cases

The primary utility of this model is its role in the boomerang distillation framework. Developers can use it in conjunction with its teacher model, Llama-3.2-3B, to dynamically construct custom-sized language models. This is achieved through a provided build_intermediate_model function, allowing for precise control over the number of patched layers to adjust the resulting model's size and performance characteristics. This approach is particularly beneficial for optimizing model size for specific computational constraints or performance requirements without the need for extensive retraining.

Overview

Model Overview

Training Details

Key Capabilities and Use Cases

Full Model Card (README)