FritzStack/IRF-Llama-3.2-3B_4bit-merged-mlx-fp16
FritzStack/IRF-Llama-3.2-3B_4bit-merged-mlx-fp16 is a 3.2 billion parameter Llama-3.2 model, developed by FritzStack, converted to the MLX format for efficient deployment on Apple Silicon. This model is a 4-bit merged version, optimized for inference performance and reduced memory footprint. It is designed for general-purpose language generation tasks, leveraging its Llama-3.2 architecture for robust performance in a compact size. Its primary utility lies in applications requiring a capable yet resource-efficient language model on MLX-compatible hardware.
Loading preview...
Model Overview
FritzStack/IRF-Llama-3.2-3B_4bit-merged-mlx-fp16 is a 3.2 billion parameter language model based on the Llama-3.2 architecture, developed by FritzStack. This specific version has been converted to the MLX format, making it suitable for optimized inference on Apple Silicon. It is a 4-bit merged variant, indicating a focus on efficiency and reduced memory consumption while maintaining performance.
Key Characteristics
- Architecture: Llama-3.2 base model.
- Parameter Count: 3.2 billion parameters.
- Quantization: 4-bit merged for efficiency.
- Format: Converted to MLX format using
mlx-lmversion 0.29.1, enabling native execution on Apple Silicon. - Context Length: Supports a context length of 32768 tokens.
Use Cases
This model is particularly well-suited for developers and applications that require a capable language model to run efficiently on Apple Silicon hardware. Its 4-bit quantization and MLX conversion make it ideal for:
- Local inference on macOS devices.
- Applications where memory footprint and inference speed are critical.
- General text generation, summarization, and question-answering tasks within resource-constrained environments.