The fnlp/Llama-2-7B-MHA-d_kv_256 model is a 7 billion parameter Llama-2 based language model developed by fnlp, featuring Multi-Head Latent Attention (MHA) with a d_kv of 256. This model is designed for economical inference by integrating DeepSeek's MHA architecture into existing Transformer-based LLMs. It aims to optimize the efficiency of large language models during deployment and operation.
No reviews yet. Be the first to review!