fairdataihub/Llama-3.1-8B-Poster-Extraction
The fairdataihub/Llama-3.1-8B-Poster-Extraction is an 8 billion parameter Llama 3.1 architecture model developed by the FAIR Data Innovations Hub. It is specifically fine-tuned for extracting structured JSON metadata from scientific conference posters, conforming to an extended DataCite-based schema. This model excels at converting raw poster text into detailed metadata, including author information, titles, conference details, content sections, and captions, with a 32K token context length.
Loading preview...
Model Overview
fairdataihub/Llama-3.1-8B-Poster-Extraction is an 8 billion parameter model built on the Llama 3.1 architecture, developed by the FAIR Data Innovations Hub. Its primary function is to transform raw text from scientific conference posters into structured JSON metadata. This model is the core component of the poster2json Python library and powers the posters.science platform, aiming to make scientific posters Findable, Accessible, Interoperable, and Reusable (FAIR).
Key Capabilities
- Structured Metadata Extraction: Converts poster content into a detailed JSON format based on the poster-json-schema, an extension of the DataCite Metadata Schema.
- Comprehensive Data Fields: Extracts critical information such as
creators(authors, affiliations),titles,publicationYear,subjects,descriptions(abstracts),conferencedetails,content.sections,imageCaptions, andtableCaptions. - High Performance: Achieves a 100% pass rate on a validation set of 10 manually annotated scientific posters, with high scores in Word Capture (0.96), ROUGE-L (0.89), and Number Capture (0.93).
- Integration: Designed to be used via the
poster2jsonPython library for easy integration into data processing pipelines.
Use Cases
This model is ideal for researchers and developers who need to:
- Automate the extraction of structured metadata from scientific posters.
- Populate databases or platforms (like posters.science) with FAIR-compliant poster information.
- Facilitate searchability and discoverability of scientific poster content.
- Process large volumes of poster data for analysis or archiving.