weber50432/lora-Meta-Llama-3.1-8B-Instruct

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Feb 28, 2025License:llama3.1Architecture:Transformer Warm

weber50432/lora-Meta-Llama-3.1-8B-Instruct is an 8 billion parameter instruction-tuned causal language model, converted to MLX format from Meta's Llama-3.1 architecture. This model offers a 32,768 token context length and is specifically designed for efficient deployment and inference within the MLX framework, making it suitable for applications requiring local execution on Apple Silicon.

Loading preview...

Model Overview

This model, weber50432/lora-Meta-Llama-3.1-8B-Instruct, is an 8 billion parameter instruction-tuned language model based on Meta's Llama-3.1 architecture. It has been specifically converted to the MLX format using mlx-lm version 0.21.1, enabling optimized performance on Apple Silicon.

Key Characteristics

  • Architecture: Derived from Meta-Llama-3.1-8B-Instruct.
  • Parameter Count: 8 billion parameters.
  • Context Length: Supports a substantial context window of 32,768 tokens.
  • Format: Optimized for MLX, facilitating efficient local inference.

Usage and Integration

This model is designed for developers working within the MLX ecosystem. It can be easily loaded and used for text generation tasks, including conversational AI, by leveraging the mlx_lm library. The provided code snippets demonstrate how to load the model and tokenizer, apply chat templates for instruction-following, and generate responses, making it straightforward to integrate into MLX-based applications.