Vision and natural language for metadata extraction from scientific PDF documents

Boukhers, Zeyd; Bouabdallah, Azeddine

doi:10.1145/3529372.3533295

Cited by 3 publications

(3 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Beyond transfer learning, models that move beyond translating object detection to document layout analysis tasks are those that include the text data as training features [12,14,20]. Often these models are "multi-modal" in that they draw from the fields of machine learning methods for image classification and segmentation and the processing of text with natural language processing or similar techniques [31].…”

Section: Unless the Answer Is Better Models?mentioning

confidence: 99%

Generalizability in Document Layout Analysis for Scientific Article Figure & Caption Extraction

Naiman¹

2023

Preprint

View full text Add to dashboard Cite

The lack of generalizability -in which a model trained on one dataset cannot provide accurate results for a different dataset -is a known problem in the field of document layout analysis. Thus, when a model is used to locate important page objects in scientific literature such as figures, tables, captions, and math formulas, the model often cannot be applied successfully to new domains. While several solutions have been proposed, including newer and updated deep learning models, larger handannotated datasets, and the generation of large synthetic datasets, so far there is no "magic bullet" for translating a model trained on a particular domain or historical time period to a new field. Here we present our ongoing work in translating our document layout analysis model from the historical astrophysical literature to the larger corpus of scientific documents within the HathiTrust U.S. Federal Documents collection. We use this example as an avenue to highlight some of the problems with generalizability in the document layout analysis community and discuss several challenges and possible solutions to address these issues. All code for this work is available on The Reading Time Machine GitHub repository, https://github.com/ReadingTimeMachine/htrc short conf.

show abstract

Section: Unless the Answer Is Better Models?mentioning

confidence: 99%

Generalizability in Document Layout Analysis for Scientific Article Figure & Caption Extraction

Naiman¹

2023

Preprint

View full text Add to dashboard Cite

show abstract

“…For instance, in the case of document corpora, Natural Language Processing (NLP) techniques can be employed to extract titles and descriptions. Specifically, automatic metadata extraction techniques such as those in [5,28] can be utilized to extract metadata from each document, such as Publication Date, Author, Language, etc. This metadata can then be used to derive the metadata for the entire collection, such as Publication Range, Authors, Languages, etc.…”

Section: Automatic Metadata Extractionmentioning

confidence: 99%

Enhancing Data Space Semantic Interoperability through Machine Learning: a Visionary Perspective

Boukhers

Lange

Beyan

2023

Companion Proceedings of the ACM Web Conference 2023

Self Cite

View full text Add to dashboard Cite

Our vision paper outlines a plan to improve the future of semantic interoperability in data spaces through the application of machine learning. The use of data spaces, where data is exchanged among members in a self-regulated environment, is becoming increasingly popular. However, the current manual practices of managing metadata and vocabularies in these spaces are time-consuming, prone to errors, and may not meet the needs of all stakeholders. By leveraging the power of machine learning, we believe that semantic interoperability in data spaces can be significantly improved. This involves automatically generating and updating metadata, which results in a more flexible vocabulary that can accommodate the diverse terminologies used by different sub-communities. Our vision for the future of data spaces addresses the limitations of conventional data exchange and makes data more accessible and valuable for all members of the community. CCS CONCEPTS• Information systems → Data exchange; Data access methods; Semantic web description languages.

show abstract

“…This transformative capability has culminated in the creation of intelligent chatbots capable of learning from human interactions and providing responses that exhibit an exceptional level of subtlety [3]. More specifically, a critical aspect of LLMs is their ability to extract information from complex sources such as technical manuals, establishing them as advanced knowledge dissemination tools [12,13].…”

Section: Introductionmentioning

confidence: 99%

Analysis of Language-Model-Powered Chatbots for Query Resolution in PDF-Based Automotive Manuals

Medeiros,

Azevedo

et al. 2023

Vehicles

View full text Add to dashboard Cite

In the current scenario of fast technological advancement, increasingly characterized by widespread adoption of Artificial Intelligence (AI)-driven tools, the significance of autonomous systems like chatbots has been highlighted. Such systems, which are proficient in addressing queries based on PDF files, hold the potential to revolutionize customer support and post-sales services in the automotive sector, resulting in time and resource optimization. Within this scenario, this work explores the adoption of Large Language Models (LLMs) to create AI-assisted tools for the automotive sector, assuming three distinct methods for comparative analysis. For them, broad assessment criteria are considered in order to encompass response accuracy, cost, and user experience. The achieved results demonstrate that the choice of the most adequate method in this context hinges on the selected criteria, with different practical implications. Therefore, this work provides insights into the effectiveness and applicability of chatbots in the automotive industry, particularly when interfacing with automotive manuals, facilitating the implementation of productive generative AI strategies that meet the demands of the sector.

show abstract

Vision and natural language for metadata extraction from scientific PDF documents

Cited by 3 publications

References 17 publications

Generalizability in Document Layout Analysis for Scientific Article Figure & Caption Extraction

Generalizability in Document Layout Analysis for Scientific Article Figure & Caption Extraction

Enhancing Data Space Semantic Interoperability through Machine Learning: a Visionary Perspective

Analysis of Language-Model-Powered Chatbots for Query Resolution in PDF-Based Automotive Manuals

Contact Info

Product

Resources

About