Name: Kunger/Sakura-14B-Qwen2.5-v1.0 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kunger

Model Overview

Kunger/Sakura-14B-Qwen2.5-v1.0 is a 14.8 billion parameter model derived from the SakuraLLM/Sakura-14B-Qwen2.5-v1.0-GGUF. Its primary purpose is to provide a dequantized version suitable for PyTorch inference using the Hugging Face Transformers library, particularly for environments where llama.cpp support or performance is suboptimal.

Key Characteristics

Dequantized Model: This version is a dequantized form of the original GGUF model, specifically from a Q6K quantization.
PyTorch Inference: Designed to facilitate inference directly with PyTorch, bypassing llama.cpp for certain setups.
Precision Considerations: Due to its Q6K dequantization, the model's precision is significantly lower than F16, and its inference results have not been extensively tested for accuracy.
Tokenizer Note: Users might observe changes in the tokenizer's vocabulary after dequantization. It is suggested that the tokenizer from the original Qwen2.5 model could be used as a replacement if issues arise.

When to Consider This Model

PyTorch-centric Workflows: Ideal for developers who prefer or require using the Hugging Face Transformers library with PyTorch for inference.
Limited llama.cpp Support: Useful in environments where llama.cpp performance is constrained or not fully supported.

Limitations

The model's precision is reduced compared to F16 due to Q6K dequantization.
The impact of dequantization on inference results has not been thoroughly evaluated.

Overview

Model Overview

Key Characteristics

When to Consider This Model

Limitations

Full Model Card (README)