ellamind/propella-1-0.6b

Warm
Public
0.8B
BF16
40960
Jan 10, 2026
License: apache-2.0
Hugging Face
Overview

Overview

ellamind/propella-1-0.6b is a compact 0.6 billion parameter model within the propella-1 family, developed by ellamind. It specializes in annotating text documents to facilitate large-scale data curation for LLM training. Despite its small size, it offers high throughput and accuracy, supporting 57 languages and various document formats.

Key Capabilities

  • Comprehensive Annotation: Annotates documents across 18 distinct properties, including content integrity, educational value, reasoning indicators, and time-sensitivity, organized into six categories.
  • Multilingual Support: Capable of processing text in 57 different languages.
  • Versatile Input: Handles diverse text formats such as web pages, PDFs, code, and mathematical content.
  • High Throughput: Optimized for fast inference, achieving 39.9 documents/second on an H100 GPU, making it efficient for large datasets.
  • Structured Output: Generates annotations as JSON objects conforming to a predefined schema, ensuring strict and error-free data.

Good For

  • LLM Training Data Curation: Ideal for filtering, selecting, and curating vast amounts of text data for training large language models.
  • Automated Content Evaluation: Useful for automatically assessing various aspects of text content, from quality and value to safety and compliance.
  • Research and Development: A valuable tool for researchers exploring data annotation and quality control in multilingual contexts.