2015
DOI: 10.1002/asi.23379
|View full text |Cite
|
Sign up to set email alerts
|

Information retrieval from historical newspaper collections in highly inflectional languages: A query expansion approach

Abstract: The aim of the study was to test whether query expansion by approximate string matching methods is beneficial in retrieval from historical newspaper collections in a language rich with compounds and inflectional forms (Finnish). First, approximate string matching methods were used to generate lists of index words most similar to contemporary query terms in a digitized newspaper collection from the 1800s. Top index word variants were categorized to estimate the appropriate query expansion ranges in the retrieva… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0
2

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
4
1

Relationship

4
5

Authors

Journals

citations
Cited by 13 publications
(14 citation statements)
references
References 41 publications
0
12
0
2
Order By: Relevance
“…Users of the Digi collection have complained about the poor OCR of the collection relatively little, but some of them have reported curious search results and been annoyed by the OCR quality (Hölttä, 2016;Kettunen, Pääkkönen, Koistinen, 2016). Basing on the empirical search results with the evaluation collection derived from a small subset of the whole Digi material (Järvelin et al 2016), it is evident that search results in the Digi collection itself are not optimal, and better OCR quality would probably improve them.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Users of the Digi collection have complained about the poor OCR of the collection relatively little, but some of them have reported curious search results and been annoyed by the OCR quality (Hölttä, 2016;Kettunen, Pääkkönen, Koistinen, 2016). Basing on the empirical search results with the evaluation collection derived from a small subset of the whole Digi material (Järvelin et al 2016), it is evident that search results in the Digi collection itself are not optimal, and better OCR quality would probably improve them.…”
Section: Discussionmentioning
confidence: 99%
“…Orthography of Finnish was already reasonably stable in the mid-19 th century, although there were phenomena that differ from modern language (cf. table 1. in Järvelin et. al 2016).…”
Section: Orthographical Variation and Out-of-vocabulary Wordsmentioning
confidence: 95%
“…Kettunen ja Pääkkönen (2018) ovat aiemmin esitelleet sanomalehtikokoelman tarjoamia mahdollisuuksia Informaatiotutkimus-lehdessä. Järvelin et al (2016) tekee selkoa historiatiedonhaun haasteista käytettäessä suomenkielisiä tekstikokoelmia.…”
Section: Tarkastellut Kokoelmatunclassified
“…Na literatura há um número grande de propostas para rotulação de tópicos. O leitor interessado em outras abordagens é direcionado a consultar as seguintes referências [16][17][18][19] Inicialmente, supõem-se disponível como entrada o número de tópicos de interesse e uma palavra semente para cada tópico, criteriosamente escolhida. Essa definição do número de tópicos e das sementes pode ser influenciada pelo conhecimento do contexto ou do domínio de aplicação de interesse.…”
Section: Extração Da Estrutura De Tópicosunclassified