Name: nixiesearch/nixie-querygen-v2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: nixiesearch

Overview

nixiesearch/nixie-querygen-v2 is a 7 billion parameter language model, fine-tuned from Mistral-7B-v0.1, specifically for generating synthetic queries. This model addresses the challenge of creating relevant queries when only document collections are available or when expanding limited query-document datasets for embedding training. It leverages the principles of the docTTTTTquery approach.

Key Capabilities

Synthetic Query Generation: Creates queries from documents, useful for downstream embedding fine-tuning tasks where explicit queries are scarce.
Dataset Expansion: Enhances existing, small query-document datasets by generating additional synthetic queries based on the provided documents.
Flexible Prompting: Supports optional modifiers like [short|medium|long] and [question|regular] to control query characteristics.

Training Details

The model was trained using nixietune on a dataset of 200,000 query-document pairs, sampled from a variety of Information Retrieval (IR) datasets. It supports a context length of 4096 tokens.

Deployment Options

Available in multiple formats for diverse deployment scenarios:

PyTorch FP16 checkpoint: Suitable for further fine-tuning.
GGUF F16 (non-quantized): For CPU inference with llama.cpp.
GGUF Q4_0 (quantized): For faster, less precise CPU inference with llama.cpp.

Overview

Overview

Key Capabilities

Training Details

Deployment Options

Full Model Card (README)