Pclanglais/POntAvignon-4b

TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Apr 8, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Pclanglais/POntAvignon-4b is a 4 billion parameter reasoning model based on Qwen3-4B, specifically designed to annotate French theater programmes from the Festival d'Avignon. It extracts structured Linked Art JSON-LD entities from raw markdown, utilizing chain-of-thought reasoning. This model excels at parsing complex French theatrical vocabulary and typographic conventions to produce highly accurate, structured data for performing arts ontology extensions.

Loading preview...

POntAvignon-4b: Specialized French Theater Programme Annotation Model

Pclanglais/POntAvignon-4b is a 4 billion parameter model, built upon Qwen/Qwen3-4B, and fine-tuned using Pleias' Baguettotron SYNTH-syntax. Its core function is to annotate French theater programmes from the Festival d'Avignon (1947–present), transforming raw markdown into structured Linked Art JSON-LD entities. The model processes programmes with a context length of 16k tokens and achieves a 97% valid JSON rate on a held-out test set, with a token accuracy of 96.6%.

Key Capabilities

  • Structured Data Extraction: Extracts 7 distinct Linked Art entity types (e.g., PropositionalObject for abstract works, Activity for productions/performances, LinguisticObject for source texts).
  • Chain-of-Thought Reasoning: Employs <think> tags to generate dense reasoning traces, explicitly naming tasks, engaging with document structure, and resolving ontological boundaries before outputting JSON-LD.
  • French Theatrical Expertise: Handles French theatrical vocabulary, BnF role mapping, and historical typographic conventions.
  • Ontology Alignment: Targets the Linked Art Performing Arts extension (v0.9), incorporating BnF role vocabulary, deterministic content-derived IDs, and source attribution for every extracted fact.
  • Robust Training: Trained on 12,507 samples derived from ~1,400 Festival d'Avignon programmes (1971–2022), using a mix of Claude Sonnet and Gemma 12B backreasoning for trace generation.

Good for

  • Digital Humanities Research: Ideal for researchers working with historical French theater archives, particularly those from the Festival d'Avignon.
  • Knowledge Graph Construction: Facilitates the creation of structured knowledge graphs for performing arts by converting unstructured programme data into Linked Art JSON-LD.
  • Specialized NLP Tasks: Demonstrates strong performance in highly specialized information extraction tasks requiring deep domain understanding and complex reasoning.

Limitations

  • Specialized Scope: Primarily trained on Festival d'Avignon programmes; performance on other festivals or non-French theatrical traditions may vary.
  • Language Dependency: French-centric in its understanding of text, roles, and conventions.
  • Contextual Truncation: Large cast/crew lists might be truncated near the context limit.
  • Date Inference: Relies on filenames for year inference if not explicitly stated in the programme.