RthItalia/NanoLLM-Qwen2.5-14B-v3.1
TEXT GENERATIONConcurrency Cost:1Model Size:14.8BQuant:FP8Ctx Length:32kPublished:Apr 6, 2026License:otherArchitecture:Transformer Cold

RthItalia/NanoLLM-Qwen2.5-14B-v3.1 is a 14.8 billion parameter Qwen2.5-based model optimized with NanoLLM compact overlay artifacts. This model integrates `TrueQuantLinear` modules into an 8-bit base, achieving high next-token-logit cosine similarity (avg >= 0.99) against the 8-bit reference. It is designed for efficient deployment and research evaluation, maintaining performance while reducing artifact size.

Loading preview...