RthItalia/NanoLLM-Qwen2.5-7B-v3.1
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Apr 6, 2026License:otherArchitecture:Transformer Cold

RthItalia/NanoLLM-Qwen2.5-7B-v3.1 is a 7.6 billion parameter model based on the Qwen2.5 architecture, developed by RthItalia. This model utilizes compact overlay artifacts to optimize Qwen2.5 models, starting from an 8-bit base and replacing modules with TrueQuantLinear for efficiency. It is designed for research and evaluation of quantized large language models, focusing on maintaining high cosine similarity to the 8-bit reference. Its primary use case is to provide a compact and efficient version of Qwen2.5 for deployment in resource-constrained environments.

Loading preview...