chulcher/Octen-Embedding-8B-mlx is an 8 billion parameter embedding model, pre-converted for Apple Silicon (MLX framework) to eliminate the 30-minute conversion step required by the original Octen-Embedding-8B. This model excels in embedding tasks, ranking #1 on MTEB/RTEB benchmarks with a score of 0.8045, surpassing commercial embedding APIs. It is designed for efficient local execution on macOS with M-series chips, providing fast text embeddings.
Loading preview...
Octen-Embedding-8B-mlx Overview
This repository provides pre-converted MLX weights for the Octen-Embedding-8B model, specifically optimized for Apple Silicon. The primary purpose is to streamline deployment by offering ready-to-use weights, bypassing the approximately 30-minute conversion process and 32GB temporary disk space requirement of the original model.
Key Capabilities & Features
- Optimized for Apple Silicon: Directly runnable on M1/M2/M3/M4 chips using the MLX framework.
- High Performance: Achieves a #1 ranking on the MTEB/RTEB benchmark with a score of 0.8045, outperforming many commercial embedding APIs.
- Efficient Embedding: Provides fast text embedding generation, with typical latencies of 50-200ms per text on Apple Silicon.
- OpenAI-Compatible Endpoint: Can be served via
octen-embeddings-serverto expose a/v1/embeddingsendpoint.
Hardware Requirements
- CPU: Apple Silicon (M1/M2/M3/M4)
- RAM: 20 GB or more
- Disk: Approximately 16 GB for model weights
- OS: macOS 13+
When to Use This Model
This model is ideal for developers and applications requiring high-quality, fast, and locally-executed text embeddings on Apple Silicon hardware. Its top-tier benchmark performance makes it suitable for various natural language processing tasks where robust semantic representations are crucial.