Pclanglais/Brahe: An Analytical LLM for Multilingual Literature
Brahe is a 13 billion parameter analytical large language model, fine-tuned from Llama-13B, specifically developed for the computational humanities. Its primary function is to analyze literary texts and generate a comprehensive list of up to twenty potential annotations.
Key Capabilities
- Detailed Text Annotation: Brahe can identify and annotate various aspects of a text, including:
- Summary: A concise overview of the text.
- Tone: General tonality (e.g., humoristic, tragic).
- Genre: Specific literary categories (e.g., detective fiction, romance).
- Literary Form: Description of a place, conversation, stream of consciousness.
- Trope: Identification of literary clichés.
- Enunciation: Who is speaking (e.g., first-person, omniscient narrator).
- Narrative Arc: How the action unfolds (e.g., suspense, dramatic tension).
- Character Identification: Active and mentioned characters.
- Time and Place Settings: Absolute and fuzzy time/place, historical period.
- Multilingual Support: Trained on 8,000 literary excerpts, half in English and half in other languages (primarily French, German, Italian). Thanks to Llama-13B's native multilingual capacity, it has demonstrated functionality on languages not explicitly in its training corpus, such as Gascon Occitan.
- Confidence-Based Annotation: Annotations are only generated when the model is sufficiently confident, ensuring higher quality output.
Good For
- Computational Humanities Projects: Ideal for researchers and developers working on large-scale literary analysis.
- Literary Scholars: Provides deep analytical insights into textual characteristics.
- Textual Data Mining: Useful for extracting structured information from unstructured literary data.
Brahe is designed as a companion to Epstein, a generative AI model for creating new literary texts, with both models named after characters from Daniele del Giudice's novel Atlante occidentale.