invalid-coder/TinyLlama-1.1B-intermediate-step-1431k-3T-laser-dpo
invalid-coder/TinyLlama-1.1B-intermediate-step-1431k-3T-laser-dpo is a 1.1 billion parameter Llama-architecture model that utilizes a novel training technique called laserRMT to prevent catastrophic forgetting, particularly when teaching specific skills like function calling. This model is an intermediate checkpoint from the TinyLlama project, which aims to pretrain a 1.1B Llama model on 3 trillion tokens. Its compact size and Llama 2 compatibility make it suitable for applications with restricted computational and memory footprints.
Loading preview...
TinyLlama-1.1B-intermediate-step-1431k-3T-laser-dpo Overview
This model is an intermediate checkpoint from the TinyLlama project, which focuses on pretraining a 1.1 billion parameter Llama-architecture model on an extensive 3 trillion token dataset. It adopts the exact architecture and tokenizer of Llama 2, ensuring broad compatibility with existing open-source projects built upon Llama.
Key Differentiators & Capabilities
- Catastrophic Forgetting Prevention: Implements a novel training technique based on laserRMT, which partially freezes the model according to a laser-like analysis. This method effectively prevents the model from forgetting previously acquired knowledge, a critical advantage for teaching specific skills such as function calling.
- Compact Size: With only 1.1 billion parameters, TinyLlama is designed for applications requiring a restricted computation and memory footprint.
- Llama 2 Compatibility: Its identical architecture and tokenizer to Llama 2 allow for seamless integration into many Llama-based ecosystems.
- Extensive Pretraining: This specific checkpoint has been trained on 3 trillion tokens, representing a significant pretraining effort for its size.
Performance Highlights
Evaluations against various benchmarks show progressive improvements with increased training tokens. For instance, the TinyLlama-1.1B-intermediate-step-1431k-3T checkpoint achieves an average score of 52.99 across HellaSwag, Obqa, WinoGrande, ARC_c, ARC_e, boolq, and piqa benchmarks, demonstrating competitive performance for its parameter count.
Ideal Use Cases
This model is particularly well-suited for scenarios where:
- Computational resources and memory are limited.
- Integration with Llama 2-based applications is desired.
- The ability to learn and retain specific skills, like function calling, without catastrophic forgetting is crucial.