A Modular Metadata Extraction System for Born-Digital Articles

Tkaczyk, Dominika; Bolikowski, Łukasz; Czeczko, Artur; Rusek, K.

doi:10.1109/das.2012.4

Cited by 15 publications

(10 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Tkaczyk et. al [12], presentan un sistema integral para la extracción de metadatos en artículos escolares basados en el análisis de la estructura del documento, de encabezado a pie de página. Este trabajo implementó las siguientes bibliotecas: biblioteca iText y LibSVM; y los algoritmos: Docstrum, algoritmos basados en heurísticas de abajo a arriba, agrupación KMeans y campos aleatorios condicionales.…”

Section: Trabajos Relacionadosunclassified

Extracción semiautomática de metadatos en documentos no estructurados utilizando procesamiento de lenguaje natural y propiedades tipográficas

Rendón¹,

Torres‐Moreno²,

Sierra³

et al. 2019

RCS

View full text Add to dashboard Cite

Today there are different systems capable of extracting digital document metadata. However, the absence of a structure de fi ned in the distribution of metadata in documents from a digital art library presents a major problem, this is generally due to the style that each author or publisher decides to use both on the cover and in the cover of the document. Although there are software tools that perform the task of extracting metadata, they focus only on structured documents such as journals, scientific articles, etc. Metadata is not more than structured information data, that is, information information or data data. This paper introduces the use of natural language techniques and typographic information of the text in the document for the extraction of metadata, such as: title, authors, publisher and date of publication. The results obtained in the evaluation with unstructured digital documents indicate the potential of the proposed approach, which is capable of producing good results in the extraction of metadata.

show abstract

Section: Trabajos Relacionadosunclassified

Extracción semiautomática de metadatos en documentos no estructurados utilizando procesamiento de lenguaje natural y propiedades tipográficas

Rendón¹,

Torres‐Moreno²,

Sierra³

et al. 2019

RCS

View full text Add to dashboard Cite

show abstract

“…Az informatikai módszereket és szövegbányászati eljárásokat tekintve több metódus is szóba jöhet, a szakirodalom alapján az egyik legrelevánsabb megoldás a rejtett Markovmodell alkalmazása (Hetzner, 2008;Ojokoh, Zhang, & Tang, 2011). E módszer mellett a szakemberek más, a mesterséges intelligencia alapján kidolgozott megközelítéseket is alkalmaznak, melyeket általában különféle gépi tanulási algoritmusok segítségével érnek el (Tkaczyk, Bolikowski, Czeczko, & Rusek, 2012;Tkaczyk, Szostek, Fedoryszak, Dendek, & Bolikowski, 2015). Sarawagi (2007) nagyívű összefoglaló munkájában rendszerbe foglalja az automatizált alapú információ-kivonatoló módszereket, több helyen külön is kiemelve a hivatkozásokkal mint speciális információtípussal kapcsolatos tudnivalókat.…”

Section: Elméleti Háttérunclassified

A Magyar Pedagógia folyóirat tudománymetriai elemzése a hivatkozási szokások és a hivatkozási hálózatok tükrében

Nagy¹,

Molnár²

2018

Magyar Pedagógia

View full text Add to dashboard Cite

“…Citation parser is a part of CERMINE -a metadata and content extraction tool [11]. Pawlak, Zdzis¡aw (1982).…”

Section: Citation Parsingmentioning

confidence: 99%

Large Scale Citation Matching Using Apache Hadoop

Fedoryszak

Tkaczyk

Bolikowski

2013

Research and Advanced Technology for Digital Libraries

Self Cite

View full text Add to dashboard Cite

During the process of citation matching links from bibliography entries to referenced publications are created. Such links are indicators of topical similarity between linked texts, are used in assessing the impact of the referenced document and improve navigation in the user interfaces of digital libraries. In this paper we present a citation matching method and show how to scale it up to handle great amounts of data using appropriate indexing and a MapReduce paradigm in the Hadoop environment.

show abstract

A Modular Metadata Extraction System for Born-Digital Articles

Cited by 15 publications

References 15 publications

Extracción semiautomática de metadatos en documentos no estructurados utilizando procesamiento de lenguaje natural y propiedades tipográficas

Extracción semiautomática de metadatos en documentos no estructurados utilizando procesamiento de lenguaje natural y propiedades tipográficas

A Magyar Pedagógia folyóirat tudománymetriai elemzése a hivatkozási szokások és a hivatkozási hálózatok tükrében

Large Scale Citation Matching Using Apache Hadoop

Contact Info

Product

Resources

About