NanoLLM-Qwen2.5-14B-v3.1 Overview
This model, RthItalia/NanoLLM-Qwen2.5-14B-v3.1, is a 14.8 billion parameter variant of the Qwen2.5-Instruct series, enhanced with NanoLLM v3.1 compact overlay artifacts. The NanoLLM process replaces specific modules in the base Qwen2.5 model with TrueQuantLinear modules, operating on an 8-bit base model.
Key Capabilities & Features
- Efficient Quantization: Utilizes NanoLLM's proprietary quantization pipeline to create compact artifacts, enabling more efficient deployment.
- Performance Validation: Artifacts are rigorously validated, passing a gate check for average next-token-logit cosine similarity against an 8-bit reference, with this 14B model achieving an average of
0.990625. - Base Model Integration: The loader starts with the base Qwen2.5-14B-Instruct model in bitsandbytes 8-bit mode, then integrates the NanoLLM-processed modules.
- Research & Evaluation Focus: The generated artifacts are published primarily for research and evaluation purposes.
Usage Notes
Users can quickly load and utilize the model with a provided Python script, requiring torch, transformers, accelerate, bitsandbytes, and safetensors. While the release tests use an 8-bit base, experimental 4-bit loading is possible. This model is particularly suited for scenarios where maintaining high fidelity with a quantized model is crucial for research or specific application deployments.