ellamind/propella-1-4b

Warm
Public
4B
BF16
40960
Jan 10, 2026
License: apache-2.0
Hugging Face
Overview

propella-1-4b: A Specialized Multilingual Annotation Model

ellamind's propella-1-4b is a 4 billion parameter model from the propella-1 family, engineered for efficient and accurate text document annotation. It's part of a series of small, fast, and highly multilingual LLMs (including 0.6B and 1.7B variants) optimized for data curation tasks.

Key Capabilities

  • Comprehensive Annotation: Annotates documents across 18 distinct properties, categorized into Core Content, Classification, Quality & Value, Audience & Purpose, Safety & Compliance, and Geographic Relevance. This includes metrics like content integrity, information density, educational value, reasoning indicators, and PII presence.
  • Multilingual Support: Capable of processing text in 57 languages, making it suitable for diverse global datasets.
  • High Throughput: Designed for high-throughput inference, with the 4B model achieving 27.0 docs/s on an H100 GPU (fp8), significantly reducing the time needed to annotate large datasets.
  • Structured Output: Generates annotations as strict JSON objects with enumerated values, ensuring consistency and ease of integration.
  • Context Length: Supports a 64k context length, with a recommendation to truncate documents at 50k characters for optimal performance.

Good For

  • LLM Training Data Curation: Ideal for filtering, selecting, and curating large-scale datasets for training other language models.
  • Automated Content Evaluation: Automatically assessing various aspects of text documents, from quality and safety to audience relevance.
  • Multilingual Data Processing: Handling and annotating text content across a wide array of languages efficiently.
  • High-Volume Annotation Tasks: Leveraging its speed and optimized inference (including fp8 support) for processing millions of documents.