quicktensor/blockrank-msmarco-mistral-7b

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Nov 4, 2025License:mitArchitecture:Transformer Open Weights Cold

quicktensor/blockrank-msmarco-mistral-7b is a 7 billion parameter language model, fine-tuned from Mistral-7B-Instruct-v0.3 by Nilesh Gupta and collaborators. It is optimized for efficient in-context document ranking using the BlockRank method, which employs structured sparse attention to reduce computational complexity. This model achieves strong zero-shot generalization on BEIR benchmarks and offers 2-4x faster inference for ranking tasks.

Loading preview...

Overview

quicktensor/blockrank-msmarco-mistral-7b is a 7 billion parameter model, fine-tuned from Mistral-7B-Instruct-v0.3, specifically designed for scalable in-context document ranking. Developed by Nilesh Gupta and his team, this model integrates the BlockRank method to enhance efficiency and performance in ranking tasks.

Key Capabilities

  • Efficient In-context Ranking: Optimized for ranking documents within the model's context window.
  • Linear Complexity Attention: Utilizes structured sparse attention to reduce computational complexity from O(n²) to O(n), making it highly scalable.
  • Faster Inference: Achieves 2-4 times faster inference speeds for ranking by eliminating the need for autoregressive decoding.
  • Improved Relevance Signals: Incorporates an auxiliary contrastive loss at mid-layers to strengthen relevance signals.
  • Strong Zero-shot Generalization: Demonstrates state-of-the-art performance on BEIR benchmarks without specific in-domain training.

Good For

This model is ideal for applications requiring efficient and accurate document ranking, particularly in scenarios where large language models are used for retrieval-augmented generation or information retrieval. Its optimized architecture makes it suitable for scalable ranking tasks where speed and computational efficiency are critical.