zenlm/zen-nano

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.8BQuant:BF16Ctx Length:32kPublished:Sep 26, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

Zen Nano is an ultra-lightweight 0.6 billion parameter causal language model developed by Zen AI Team (Hanzo AI), optimized for edge devices and mobile deployment. It supports English and Chinese with a 32K token context length, delivering impressive performance and efficiency on resource-constrained hardware. This model is designed for applications requiring fast, low-power inference directly on devices like smartphones and IoT.

Loading preview...

Zen Nano: Ultra-Lightweight Model for Edge AI

Zen Nano is a compact 0.6 billion parameter causal language model developed by Zen AI Team (Hanzo AI), specifically engineered for deployment on edge devices and mobile platforms. It offers a 32K token context window and supports both English and Chinese, making it versatile for various global applications.

Key Capabilities & Features

  • Ultra-Lightweight: At just 0.6B parameters, it's ideal for environments with limited resources.
  • High Efficiency: Achieves 44,000 tokens/sec on M3 Max (MLX) and 8,000 tokens/sec on iPhone 15 Pro, with memory usage as low as 0.3GB (Q2_K).
  • Multilingual Support: Capable in both English and Chinese.
  • Flexible Formats: Available in PyTorch, MLX, and GGUF (Q2_K to F16) for broad compatibility.
  • Abliteration: Features a unique 'abliteration' process that removes refusal behaviors by nullifying the "refusal direction" in the model's residual stream, enabling unrestricted research and application-layer safety management.

Ideal Use Cases

  • Edge AI: Running AI tasks directly on devices without cloud dependency.
  • Mobile Applications: Powering chatbots and AI assistants on smartphones.
  • IoT Devices: Providing intelligence to internet-of-things hardware.
  • Resource-Constrained Environments: Where power and computational resources are limited.
  • Real-time Inference: For applications requiring immediate responses.