2011
DOI: 10.1007/978-3-642-21034-1_15
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Semantic Subject Indexing of Web Documents in Highly Inflected Languages

Abstract: Abstract. Structured semantic metadata about unstructured web documents can be created using automatic subject indexing methods, avoiding laborious manual indexing. A succesful automatic subject indexing tool for the web should work with texts in multiple languages and be independent of the domain of discourse of the documents and controlled vocabularies. However, analyzing text written in a highly inflected language requires word form normalization that goes beyond rule-based stemming algorithms. We have test… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
5
0

Year Published

2014
2014
2021
2021

Publication Types

Select...
2
2
1

Relationship

1
4

Authors

Journals

citations
Cited by 6 publications
(6 citation statements)
references
References 17 publications
1
5
0
Order By: Relevance
“…If the amount of keywords for a document would have been 5 instead of 1 in the original annotations, the keyword would have been included in the list of generated keywords. The results are, however, similar but not fully comparable to the results of Sinkkilä et al [23] for different Finnish texts. AATOS performed well in contrast to the tools and strategies used in the study.…”
Section: Discussionsupporting
confidence: 48%
“…If the amount of keywords for a document would have been 5 instead of 1 in the original annotations, the keyword would have been included in the list of generated keywords. The results are, however, similar but not fully comparable to the results of Sinkkilä et al [23] for different Finnish texts. AATOS performed well in contrast to the tools and strategies used in the study.…”
Section: Discussionsupporting
confidence: 48%
“…Typically the entities are organized in a knowledge base or ontology. The tools use different algorithms and training data, and few comparative evaluations have been conducted to identify the conditions under which each tool is the most appropriate [27,29,17]. However, due to the algorithms used, the tools work best on full-text documents but not on a user's search keywords.…”
Section: Semantic Annotationmentioning
confidence: 99%
“…These allow you to process a lot of information quickly and cheaply, and also ensure the inter-indexer consistency. However automatic systems also present problems because of the complexity of natural language processing (Sinkkilä et al, 2011). Consequently the semi-automatic indexing approach is a good solution, because in addition to obviating the problems of the automatic indexing system it facilitates the the task of indexers by providing suitable term suggestions (Vasuki and Cohen, 2010).…”
Section: Research Objectivesmentioning
confidence: 99%