chargoddard/llama3-42b-v0
chargoddard/llama3-42b-v0 is a 42 billion parameter base language model pruned from Meta's Llama 3 70B, retaining an 8192-token context length. This model was created using the methodology from "The Unreasonable Ineffectiveness of the Deeper Layers" and further trained with QLoRA on the MiniPile dataset. It is designed as an experimental foundation model, not instruction-tuned, and is intended for developers exploring efficient large language model architectures.
Loading preview...
chargoddard/llama3-42b-v0: A Pruned Llama 3 Base Model
This model is an experimental 42 billion parameter base language model, derived from Meta's Llama 3 70B. It was created by pruning the original 70B model using the methodology described in the paper "The Unreasonable Ineffectiveness of the Deeper Layers" and selecting layers with PruneMe.
Key Characteristics & Performance
- Architecture: Pruned Llama 3 70B base model, resulting in 42 billion parameters.
- Training: Post-pruning, it underwent further training using QLoRA for approximately 100 million tokens from the JeanKaddour/minipile dataset.
- Base Model Nature: This is a base model and is not instruction-tuned. Using Llama 3 instruction formats will yield unpredictable results.
- Context Length: Maintains an 8192-token context window.
- Evaluated Performance: Initial evaluations show promising results for its size, including:
- MMLU (0-shot): 0.7669
- Winogrande (5-shot): 0.8027
- Hellaswag (10-shot): 0.8025 (acc_norm)
Intended Use Cases
- Research & Experimentation: Ideal for researchers and developers exploring model pruning techniques and their impact on performance.
- Foundation for Fine-tuning: Serves as a compact base model for custom fine-tuning tasks where a smaller footprint than the full Llama 3 70B is desired.
- Efficiency Studies: Useful for evaluating the trade-offs between model size, training data, and performance in pruned architectures.