Overview
ellamind/propella-1-1.7b is a 1.7 billion parameter model from the propella-1 family, developed by ellamind. It is a small, multilingual LLM engineered for efficient text document annotation, primarily to facilitate the filtering, selection, and curation of LLM training data at scale. The model is trained in fp8 precision, enabling fast and accurate inference.
Key Capabilities
- Comprehensive Annotation: Annotates documents across 18 distinct properties, categorized into Core Content, Classification, Quality & Value, Audience & Purpose, Safety & Compliance, and Geographic Relevance.
- Multilingual Support: Capable of processing text in 57 different languages.
- Versatile Input Handling: Supports various text formats including web pages, PDFs, code, and mathematical content.
- High Throughput: Optimized for high-throughput inference, achieving 39.1 documents/second on an H100 GPU in fp8 mode.
- Structured Output: Generates annotations as JSON objects conforming to a predefined schema with enumerated values.
Performance and Evaluation
The propella-1-1.7b model achieves an overall performance score of 0.737, evaluated against Gemini-3-Pro annotations as ground truth. It maintains strong performance in fp8 inference mode, with only a minor score difference compared to bf16. Evaluation metrics include QWK for ordinal properties, F1 for binary properties, and IoU for multi-select properties.
Good for
- Automated large-scale data curation and filtering for LLM training datasets.
- Rapid, structured annotation of diverse text documents.
- Applications requiring fast and accurate multilingual text analysis for content quality and relevance.