Name: y22ma/qwen4b-finetune API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: y22ma

Model Overview

The y22ma/qwen4b-finetune is a 4 billion parameter language model based on the Qwen3 architecture. This particular version has been finetuned and subsequently converted into the GGUF format, making it highly compatible with llama.cpp and other local inference engines. A key aspect of its development is the use of Unsloth, which facilitated a 2x faster training process.

Key Features

Qwen3 Architecture: Leverages the capabilities of the Qwen3 model family.
GGUF Format: Provided in the efficient GGUF format, ideal for CPU and local GPU inference.
Unsloth Optimization: Benefited from Unsloth for accelerated finetuning, indicating potential for efficient further customization.
Ollama Support: Includes an Ollama Modelfile for streamlined deployment and use within the Ollama ecosystem.
Context Length: Supports a context length of 40960 tokens, allowing for processing of extensive inputs.

Usage and Deployment

This model is designed for straightforward deployment, particularly with llama.cpp and Ollama. Example commands are provided for both text-only and multimodal llama.cpp usage, highlighting its readiness for various applications. The inclusion of an Ollama Modelfile simplifies integration for users preferring that platform.

Overview

Model Overview

Key Features

Usage and Deployment

Full Model Card (README)