RthItalia/NanoLLM-Qwen2.5-3B-v3.1
TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Apr 6, 2026License:otherArchitecture:Transformer Loading

RthItalia/NanoLLM-Qwen2.5-3B-v3.1 is a 3.1 billion parameter language model based on the Qwen2.5 architecture, developed by RthItalia. This model utilizes compact overlay artifacts to apply a proprietary quantization pipeline, replacing modules with `TrueQuantLinear` for efficient deployment. It is specifically designed to run Qwen2.5 models with optimized memory usage, starting from an 8-bit base model. The model maintains high next-token-logit cosine similarity, making it suitable for applications requiring compact yet accurate language processing.

Loading preview...