Octen-Embedding-8B-mlx Overview

This repository provides pre-converted MLX weights for the Octen-Embedding-8B model, specifically optimized for Apple Silicon. The primary purpose is to streamline deployment by offering ready-to-use weights, bypassing the approximately 30-minute conversion process and 32GB temporary disk space requirement of the original model.

Key Capabilities & Features

Optimized for Apple Silicon: Directly runnable on M1/M2/M3/M4 chips using the MLX framework.
High Performance: Achieves a #1 ranking on the MTEB/RTEB benchmark with a score of 0.8045, outperforming many commercial embedding APIs.
Efficient Embedding: Provides fast text embedding generation, with typical latencies of 50-200ms per text on Apple Silicon.
OpenAI-Compatible Endpoint: Can be served via octen-embeddings-server to expose a /v1/embeddings endpoint.

Hardware Requirements

CPU: Apple Silicon (M1/M2/M3/M4)
RAM: 20 GB or more
Disk: Approximately 16 GB for model weights
OS: macOS 13+

When to Use This Model

This model is ideal for developers and applications requiring high-quality, fast, and locally-executed text embeddings on Apple Silicon hardware. Its top-tier benchmark performance makes it suitable for various natural language processing tasks where robust semantic representations are crucial.

Overview

Octen-Embedding-8B-mlx Overview

Key Capabilities & Features

Hardware Requirements

When to Use This Model

Full Model Card (README)