chulcher/Octen-Embedding-8B-mlx
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Feb 28, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

chulcher/Octen-Embedding-8B-mlx is an 8 billion parameter embedding model, pre-converted for Apple Silicon (MLX framework) to eliminate the 30-minute conversion step required by the original Octen-Embedding-8B. This model excels in embedding tasks, ranking #1 on MTEB/RTEB benchmarks with a score of 0.8045, surpassing commercial embedding APIs. It is designed for efficient local execution on macOS with M-series chips, providing fast text embeddings.

Loading preview...

Octen-Embedding-8B-mlx Overview

This repository provides pre-converted MLX weights for the Octen-Embedding-8B model, specifically optimized for Apple Silicon. The primary purpose is to streamline deployment by offering ready-to-use weights, bypassing the approximately 30-minute conversion process and 32GB temporary disk space requirement of the original model.

Key Capabilities & Features

  • Optimized for Apple Silicon: Directly runnable on M1/M2/M3/M4 chips using the MLX framework.
  • High Performance: Achieves a #1 ranking on the MTEB/RTEB benchmark with a score of 0.8045, outperforming many commercial embedding APIs.
  • Efficient Embedding: Provides fast text embedding generation, with typical latencies of 50-200ms per text on Apple Silicon.
  • OpenAI-Compatible Endpoint: Can be served via octen-embeddings-server to expose a /v1/embeddings endpoint.

Hardware Requirements

  • CPU: Apple Silicon (M1/M2/M3/M4)
  • RAM: 20 GB or more
  • Disk: Approximately 16 GB for model weights
  • OS: macOS 13+

When to Use This Model

This model is ideal for developers and applications requiring high-quality, fast, and locally-executed text embeddings on Apple Silicon hardware. Its top-tier benchmark performance makes it suitable for various natural language processing tasks where robust semantic representations are crucial.