Name: Shad0ws/Vicuna13B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Shad0ws

Shad0ws/Vicuna13B: Optimized for Local Inference

Shad0ws/Vicuna13B is a 13 billion parameter language model, derived from the lmsys/vicuna-13b-delta-v0 base model and converted using GPTQ quantization. This conversion specifically targets efficient deployment on local hardware, enabling users to run a powerful Vicuna variant with reduced memory footprint and faster inference speeds.

Key Characteristics

Base Model: lmsys/vicuna-13b-delta-v0
Quantization: 4-bit GPTQ, with a groupsize of 128, optimized for CUDA devices.
Efficiency: Designed for local execution, offering a balance of performance and resource usage.
Tokenizer: Includes an added token to the original tokenizer model, potentially enhancing specific use cases or compatibility.

Use Cases

This model is particularly well-suited for:

General Conversational AI: Engaging in dialogue, answering questions, and generating human-like text.
Local Development: Experimenting with large language models on personal machines without extensive cloud resources.
Resource-Constrained Environments: Deploying powerful language capabilities where memory and computational power are limited.

Users can integrate this model with tools like Oobabooga, specifying --wbits 4 and --groupsize 128 for optimal performance.

Overview

Shad0ws/Vicuna13B: Optimized for Local Inference

Key Characteristics

Use Cases

Full Model Card (README)