Jackrong/Gemopus-4-E4B-it

VISIONConcurrency Cost:1Model Size:7.9BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Apr 8, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The Jackrong/Gemopus-4-E4B-it is a 7.9 billion parameter instruction-tuned model derived from the Gemma-4-E4B-it base model, optimized for ultra-fast local inference on edge devices like smartphones and thin-and-light laptops. It features enhanced human preference alignment, offering a more natural, less "Wikipedia-like" conversational tone and improved contextual awareness compared to its base model. This model is designed for high-frequency local text processing tasks such as copywriting assistance, code completion, formatting, and summary extraction, prioritizing privacy and low latency.

Loading preview...

Gemopus-4-E4B-it: Edge-Optimized Instruction Model

Jackrong/Gemopus-4-E4B-it is a 7.9 billion parameter instruction-tuned model built upon the Gemma-4-E4B-it base, specifically engineered for efficient local inference on edge devices. The core motivation behind its development is to enable powerful AI assistance on personal devices like iPhones, tablets, and MacBooks, ensuring ultra-low latency and absolute data privacy by offloading high-frequency basic reasoning tasks from the cloud.

Key Optimizations & Features

This model addresses several limitations of the original Gemma-4-E4B-it, which was noted for its "Wikipedia tone," stiff translation, and overly rigid safety disclaimers. Gemopus-4-E4B-it underwent deep Human Preference Alignment to achieve:

  • Native Tone Adaptation: Strips away the "machine translation tone" and stiff "manual-style" proclamations, offering a more intimate and natural language style.
  • Deep Contextual Awareness: Improves the model's ability to capture implicit needs in multi-turn dialogues, providing more insightful and warm interactions.
  • Structural Readability: Enhances output layout and structure, utilizing Markdown for clear, hierarchically organized answers with excellent visual readability.

Performance on Edge Devices

Leveraging Apple Silicon and unified memory architecture, the model demonstrates impressive inference speeds:

  • iPhone 17 Pro Max: Achieves 45-60 tokens/s.
  • MacBook Air (M3/M4) with MLX: Reaches 90-120 tokens/s.

Ideal Use Cases

Gemopus-4-E4B-it is best suited as a local high-frequency text processing assistant. It excels in scenarios requiring:

  • Daily copywriting assistance.
  • Code completion and formatting.
  • Summary extraction.
  • Privacy-sensitive or latency-critical tasks.

Limitations

Due to its smaller parameter size, the model's world knowledge and deep logical reasoning capabilities are not comparable to larger cloud-based models. It may exhibit hallucinations with obscure domains, niche knowledge, or complex multi-step mathematical problems.