alexgusevski/Llama-3.3-8B-Instruct-128K_Abliterated-mlx-fp16

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Jan 12, 2026License:llama3.3Architecture:Transformer Cold

The alexgusevski/Llama-3.3-8B-Instruct-128K_Abliterated-mlx-fp16 model is an 8 billion parameter instruction-tuned language model, converted to the MLX format for efficient deployment. Based on the Llama-3.3 architecture, this model is designed for general instruction following tasks. Its primary differentiation lies in its MLX conversion, enabling optimized performance on Apple silicon.

Loading preview...

Model Overview

This model, alexgusevski/Llama-3.3-8B-Instruct-128K_Abliterated-mlx-fp16, is an 8 billion parameter instruction-tuned variant of the Llama-3.3 architecture. It has been specifically converted to the MLX format using mlx-lm version 0.29.1, originating from the SicariusSicariiStuff/Llama-3.3-8B-Instruct-128K_Abliterated base model. The _Abliterated designation typically implies further fine-tuning or modifications beyond the base Llama-3.3 model, though specific details are not provided in the source README.

Key Characteristics

  • Architecture: Llama-3.3-8B-Instruct
  • Parameter Count: 8 billion parameters
  • Context Length: Supports a context window of 128K tokens (32768 tokens as per model info).
  • Format: MLX-converted, optimized for Apple silicon.
  • Instruction-tuned: Designed to follow user instructions effectively.

Use Cases

This model is suitable for developers looking to leverage an instruction-following LLM on Apple hardware. Its MLX conversion makes it ideal for:

  • Local inference on devices with Apple silicon.
  • General natural language understanding and generation tasks.
  • Applications requiring a capable 8B parameter model with a large context window.