Name: FritzStack/IRF-Llama-3.2-3B_4bit-merged-mlx-fp16 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: FritzStack

Model Overview

FritzStack/IRF-Llama-3.2-3B_4bit-merged-mlx-fp16 is a 3.2 billion parameter language model based on the Llama-3.2 architecture, developed by FritzStack. This specific version has been converted to the MLX format, making it suitable for optimized inference on Apple Silicon. It is a 4-bit merged variant, indicating a focus on efficiency and reduced memory consumption while maintaining performance.

Key Characteristics

Architecture: Llama-3.2 base model.
Parameter Count: 3.2 billion parameters.
Quantization: 4-bit merged for efficiency.
Format: Converted to MLX format using mlx-lm version 0.29.1, enabling native execution on Apple Silicon.
Context Length: Supports a context length of 32768 tokens.

Use Cases

This model is particularly well-suited for developers and applications that require a capable language model to run efficiently on Apple Silicon hardware. Its 4-bit quantization and MLX conversion make it ideal for:

Local inference on macOS devices.
Applications where memory footprint and inference speed are critical.
General text generation, summarization, and question-answering tasks within resource-constrained environments.

Overview

Model Overview

Key Characteristics

Use Cases

Full Model Card (README)