Large-scale, parallel automatic patent annotation

Agatonović, Milan; Aswani, Niraj; Bontcheva, Kalina; Cunningham, Hamish; Heitz, Thomas; Li, Yaoyong; Roberts, Ian; Tablan, Valentin

doi:10.1145/1458572.1458574

Cited by 18 publications

(24 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, such an annotation system needs to tackle the ambiguity problem if it is to be succesfully used in the domain of quantities and units. We know of two existing annotation systems that target the domain of quantities and units [7,1], and our research can be seen as a continuation of these efforts. The results of these systems are good (over 90% F-measure), but they target "clean" datasets such as patent specifications, or focus on part of the total problem, such as detecting units only.…”

Section: Introductionmentioning

confidence: 99%

Converting and Annotating Quantitative Data Tables

Assem¹,

Rijgersberg

Wigham

et al. 2010

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. Companies, governmental agencies and scientists produce a large amount of quantitative (research) data, consisting of measurements ranging from e.g. the surface temperatures of an ocean to the viscosity of a sample of mayonnaise. Such measurements are stored in tables in e.g. spreadsheet files and research reports. To integrate and reuse such data, it is necessary to have a semantic description of the data. However, the notation used is often ambiguous, making automatic interpretation and conversion to RDF or other suitable format difficult. For example, the table header cell "f (Hz)" refers to frequency measured in Hertz, but the symbol "f" can also refer to the unit farad or the quantities force or luminous flux. Current annotation tools for this task either work on less ambiguous data or perform a more limited task. We introduce new disambiguation strategies based on an ontology, which allows to improve performance on "sloppy" datasets not yet targeted by existing systems.

show abstract

Section: Introductionmentioning

confidence: 99%

Converting and Annotating Quantitative Data Tables

Assem¹,

Rijgersberg

Wigham

et al. 2010

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…GAS provides a straightforward mechanism for running applications, created with the GATE framework, as web services that carry out various NLP tasks. In practical applications we have tested a wide range of services such as named entity recognition (based on the freely-available ANNIE system Cunningham et al 2002), ontology population (Maynard et al 2009), patent processing (Agatonovic et al 2008), and automatic adjudication of multiple annotation layers in corpora.…”

Section: Gate Annotation Servicesmentioning

confidence: 99%

GATE Teamware: a web-based, collaborative text annotation framework

Bontcheva

Cunningham

Roberts

et al. 2013

Lang Resources & Evaluation

Self Cite

View full text Add to dashboard Cite

This paper presents GATE Teamware-an open-source, web-based, collaborative text annotation framework. It enables users to carry out complex corpus annotation projects, involving distributed annotator teams. Different user roles are provided (annotator, manager, administrator) with customisable user interface functionalities, in order to support the complex workflows and user interactions that occur in corpus annotation projects. Documents may be pre-processed automatically, so that human annotators can begin with text that has already been pre-annotated and thus making them more efficient. The user interface is simple to learn, aimed at non-experts, and runs in an ordinary web browser, without need of additional software installation. GATE Teamware has been evaluated through the creation of several gold standard corpora and internal projects, as well as through external evaluation in commercial and EU text annotation projects. It is

show abstract

“…Recent projects have increasingly faced the problem of running GATE-based text processing on terabyte datasets [21]. At the same time, the multi-tier service-oriented architecture of GATE Teamware, coupled with its centralized workflow engine, has made its deployment and administration too complex and error prone for many researchers.…”

Section: Large-scale Text Mining and Compute Cloudsmentioning

confidence: 99%

“…Additional experiments, not detailed here owing to space constraints, showed that processing time scales linearly with the number of tweets, on The news and Twitter datasets were annotated for named entities with the standard ANNIE entity annotation pipeline [11], deployed as SaaS within GATECloud.net. For the patents dataset, we reused a pre-existing text-processing pipeline [21] that recognizes patent-specific types, including references to other patents, scientific publications, measurement expressions, patent sections, claims, examples, references to figures and tables.…”

Section: Use Cases and Experimentsmentioning

confidence: 99%

GATECloud.net: a platform for large-scale, open-source text processing on the cloud

Tablan

Roberts

Cunningham

et al. 2013

Phil. Trans. R. Soc. A.

Self Cite

View full text Add to dashboard Cite

Cloud computing is increasingly being regarded as a key enabler of the 'democratization of science', because on-demand, highly scalable cloud computing facilities enable researchers anywhere to carry out data-intensive experiments. In the context of natural language processing (NLP), algorithms tend to be complex, which makes their parallelization and deployment on cloud platforms a non-trivial task. This study presents a new, unique, cloud-based platform for large-scale NLP research-GATECloud. net. It enables researchers to carry out dataintensive NLP experiments by harnessing the vast, on-demand compute power of the Amazon cloud. Important infrastructural issues are dealt with by the platform, completely transparently for the researcher: load balancing, efficient data upload and storage, deployment on the virtual machines, security and fault tolerance. We also include a cost-benefit analysis and usage evaluation.

show abstract

Large-scale, parallel automatic patent annotation

Cited by 18 publications

References 7 publications

Converting and Annotating Quantitative Data Tables

Converting and Annotating Quantitative Data Tables

GATE Teamware: a web-based, collaborative text annotation framework

GATECloud.net: a platform for large-scale, open-source text processing on the cloud

Contact Info

Product

Resources

About