Name: RthItalia/NanoLLM-Qwen2.5-14B-v3.1 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: RthItalia

NanoLLM-Qwen2.5-14B-v3.1 Overview

This model, RthItalia/NanoLLM-Qwen2.5-14B-v3.1, is a 14.8 billion parameter variant of the Qwen2.5-Instruct series, enhanced with NanoLLM v3.1 compact overlay artifacts. The NanoLLM process replaces specific modules in the base Qwen2.5 model with TrueQuantLinear modules, operating on an 8-bit base model.

Key Capabilities & Features

Efficient Quantization: Utilizes NanoLLM's proprietary quantization pipeline to create compact artifacts, enabling more efficient deployment.
Performance Validation: Artifacts are rigorously validated, passing a gate check for average next-token-logit cosine similarity against an 8-bit reference, with this 14B model achieving an average of 0.990625.
Base Model Integration: The loader starts with the base Qwen2.5-14B-Instruct model in bitsandbytes 8-bit mode, then integrates the NanoLLM-processed modules.
Research & Evaluation Focus: The generated artifacts are published primarily for research and evaluation purposes.

Usage Notes

Users can quickly load and utilize the model with a provided Python script, requiring torch, transformers, accelerate, bitsandbytes, and safetensors. While the release tests use an 8-bit base, experimental 4-bit loading is possible. This model is particularly suited for scenarios where maintaining high fidelity with a quantized model is crucial for research or specific application deployments.