RedHatAI/TinyLlama-1.1B-Chat-v1.0-pruned2.4
TEXT GENERATIONConcurrency Cost:1Model Size:1.1BQuant:BF16Ctx Length:2kPublished:Jan 30, 2024Architecture:Transformer0.0K Warm

RedHatAI/TinyLlama-1.1B-Chat-v1.0-pruned2.4 is a 1.1 billion parameter language model, derived from TinyLlama-1.1B-Chat-v1.0 and optimized by RedHatAI. This model has been pruned using SparseGPT via SparseML, resulting in a semi-structured sparse architecture. It is specifically designed for high-throughput serving and low memory usage when deployed with NM-vLLM, making it suitable for efficient inference of chat-based applications.

Loading preview...