LEMA-llama-2-7b: A Memory-Efficient Fine-Tuning Proof of Concept
This model, developed by Pomilon, is a 7 billion parameter Llama-2-7b instance fine-tuned using the innovative LEMA (Layer-wise Efficient Memory Abstraction) framework. It serves as a critical proof of concept, demonstrating that large language models (7B+) can be successfully fine-tuned on consumer-grade hardware with limited VRAM, such as a 16GB NVIDIA Tesla P100.
Key Capabilities & Differentiators
- Unprecedented Memory Efficiency: Achieved fine-tuning of Llama-2-7B with only 6.36 GB of VRAM, a substantial reduction compared to the 14GB+ typically required by standard LoRA configurations. This is enabled by LEMA's streaming memory strategy (Disk -> RAM -> VRAM).
- Democratizing LLM Fine-Tuning: LEMA treats model weights as a stream, processing them layer-by-layer to trade computation time for massive memory savings, making LLM fine-tuning accessible on less powerful hardware.
- Stable Training on Low VRAM: Training logs show stable loss convergence over 625 steps (1 epoch) with consistent VRAM usage below 7GB, proving the framework's stability under memory constraints.
Good For
- Researchers and Developers exploring memory-efficient LLM fine-tuning techniques.
- Understanding the practical application of the LEMA framework.
- Benchmarking and experimenting with low-VRAM LLM training setups.
Limitations
As an experimental proof-of-concept, this model was trained for only one epoch on a small, synthetic dataset. Consequently, it exhibits limitations such as token looping, hallucinations, and overfitting. It is not intended for production use or general-purpose tasks without further extensive training (3-5 epochs on a much larger, diverse dataset).