Proceedings of the 1st ACM Workshop on Patent Information Retrieval 2008
DOI: 10.1145/1458572.1458574
|View full text |Cite
|
Sign up to set email alerts
|

Large-scale, parallel automatic patent annotation

Abstract: When researching new product ideas or filing new patents, inventors need to retrieve all relevant pre-existing know-how and/or to exploit and enforce patents in their technological domain. However, this process is hindered by lack of richer metadata, which if present, would allow more powerful concept-based search to complement the current keywordbased approach. This paper presents our approach to automatic patent enrichment, tested in large-scale, parallel experiments on USPTO and EPO documents. It starts by … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
24
0

Year Published

2010
2010
2016
2016

Publication Types

Select...
4
3

Relationship

3
4

Authors

Journals

citations
Cited by 18 publications
(24 citation statements)
references
References 7 publications
0
24
0
Order By: Relevance
“…However, such an annotation system needs to tackle the ambiguity problem if it is to be succesfully used in the domain of quantities and units. We know of two existing annotation systems that target the domain of quantities and units [7,1], and our research can be seen as a continuation of these efforts. The results of these systems are good (over 90% F-measure), but they target "clean" datasets such as patent specifications, or focus on part of the total problem, such as detecting units only.…”
Section: Introductionmentioning
confidence: 99%
“…However, such an annotation system needs to tackle the ambiguity problem if it is to be succesfully used in the domain of quantities and units. We know of two existing annotation systems that target the domain of quantities and units [7,1], and our research can be seen as a continuation of these efforts. The results of these systems are good (over 90% F-measure), but they target "clean" datasets such as patent specifications, or focus on part of the total problem, such as detecting units only.…”
Section: Introductionmentioning
confidence: 99%
“…GAS provides a straightforward mechanism for running applications, created with the GATE framework, as web services that carry out various NLP tasks. In practical applications we have tested a wide range of services such as named entity recognition (based on the freely-available ANNIE system Cunningham et al 2002), ontology population (Maynard et al 2009), patent processing (Agatonovic et al 2008), and automatic adjudication of multiple annotation layers in corpora.…”
Section: Gate Annotation Servicesmentioning
confidence: 99%
“…Recent projects have increasingly faced the problem of running GATE-based text processing on terabyte datasets [21]. At the same time, the multi-tier service-oriented architecture of GATE Teamware, coupled with its centralized workflow engine, has made its deployment and administration too complex and error prone for many researchers.…”
Section: Large-scale Text Mining and Compute Cloudsmentioning
confidence: 99%
“…Additional experiments, not detailed here owing to space constraints, showed that processing time scales linearly with the number of tweets, on The news and Twitter datasets were annotated for named entities with the standard ANNIE entity annotation pipeline [11], deployed as SaaS within GATECloud.net. For the patents dataset, we reused a pre-existing text-processing pipeline [21] that recognizes patent-specific types, including references to other patents, scientific publications, measurement expressions, patent sections, claims, examples, references to figures and tables.…”
Section: Use Cases and Experimentsmentioning
confidence: 99%