Name: mhenrichsen/context-aware-splitter-1b API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: mhenrichsen

Context-Aware Splitter (CAS) for RAG

mhenrichsen/context-aware-splitter-1b is a specialized 1 billion parameter model engineered for Retrieval Augmented Generation (RAG). Its core function is to intelligently segment text, ensuring each split is contextually coherent and can be read independently, which is vital for effective RAG systems.

Key Capabilities

Context-aware splitting: Unlike traditional splitters, CAS reads and understands the context of the input text to determine optimal split points.
Word count adherence: It provides splits based on a user-defined word count, with the flexibility for overlaps where meaningful.
Structured output: Returns a dictionary containing a list of text splits and an inferred topic for the entire input.
Danish language focus: Trained on 12.3k Danish texts (13.4M tokens), making it particularly effective for Danish content.
Alpaca prompt format: Utilizes the Alpaca instruction format for clear input and response structuring.

Good for

Optimizing RAG pipelines: Pre-processing documents into semantically rich chunks for improved retrieval accuracy.
Handling Danish text: Specifically fine-tuned for the nuances of the Danish language.
Ensuring contextual integrity: Maintaining the meaning and readability of text segments after splitting.

Overview

Context-Aware Splitter (CAS) for RAG

Key Capabilities

Good for

Full Model Card (README)