Kunger/Sakura-14B-Qwen2.5-v1.0
TEXT GENERATIONConcurrency Cost:1Model Size:14.8BQuant:FP8Ctx Length:32kPublished:Nov 26, 2024License:cc-by-nc-sa-4.0Architecture:Transformer0.0K Open Weights Warm
Kunger/Sakura-14B-Qwen2.5-v1.0 is a 14.8 billion parameter Qwen2.5-based causal language model. This model is a dequantized version of the SakuraLLM/Sakura-14B-Qwen2.5-v1.0-GGUF, intended for PyTorch inference where llama.cpp support is limited. It allows for direct use with the transformers library, though its precision is noted to be lower than F16 due to its Q6K dequantization.
Loading preview...
Model Overview
Kunger/Sakura-14B-Qwen2.5-v1.0 is a 14.8 billion parameter model derived from the SakuraLLM/Sakura-14B-Qwen2.5-v1.0-GGUF. Its primary purpose is to provide a dequantized version suitable for PyTorch inference using the Hugging Face Transformers library, particularly for environments where llama.cpp support or performance is suboptimal.
Key Characteristics
- Dequantized Model: This version is a dequantized form of the original GGUF model, specifically from a Q6K quantization.
- PyTorch Inference: Designed to facilitate inference directly with PyTorch, bypassing
llama.cppfor certain setups. - Precision Considerations: Due to its Q6K dequantization, the model's precision is significantly lower than F16, and its inference results have not been extensively tested for accuracy.
- Tokenizer Note: Users might observe changes in the tokenizer's vocabulary after dequantization. It is suggested that the tokenizer from the original Qwen2.5 model could be used as a replacement if issues arise.
When to Consider This Model
- PyTorch-centric Workflows: Ideal for developers who prefer or require using the Hugging Face Transformers library with PyTorch for inference.
- Limited
llama.cppSupport: Useful in environments wherellama.cppperformance is constrained or not fully supported.
Limitations
- The model's precision is reduced compared to F16 due to Q6K dequantization.
- The impact of dequantization on inference results has not been thoroughly evaluated.