Proceedings of the 2nd International Workshop on Web as Corpus - WAC '06 2006
DOI: 10.3115/1628297.1628298
|View full text |Cite
|
Sign up to set email alerts
|

Web-based frequency dictionaries for medium density languages

Abstract: Frequency dictionaries play an important role both in psycholinguistic experiment design and in language technology. The paper describes a new, freely available, web-based frequency dictionary of Hungarian that is being used for both purposes, and the language-independent techniques used for creating it.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
17
0
2

Year Published

2007
2007
2020
2020

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 25 publications
(19 citation statements)
references
References 16 publications
(11 reference statements)
0
17
0
2
Order By: Relevance
“…The corpora are evaluated through the extraction and analysis of wordlists. Other projects that have used web crawls in order to create frequency lists include Kornai et al (2006), who work on Hungarian, and Emerson and O'Neil (2006) on Chinese. A particularly impressive web-derived frequency list is the Google terabyte n-gram collection, made publicly available in 2006 (Brants and Franz 2006).…”
Section: Related Workmentioning
confidence: 99%
“…The corpora are evaluated through the extraction and analysis of wordlists. Other projects that have used web crawls in order to create frequency lists include Kornai et al (2006), who work on Hungarian, and Emerson and O'Neil (2006) on Chinese. A particularly impressive web-derived frequency list is the Google terabyte n-gram collection, made publicly available in 2006 (Brants and Franz 2006).…”
Section: Related Workmentioning
confidence: 99%
“…As in Hungarian all the possible constituent orders are grammatical, such a preference could only be attributed to differences in frequency of occurrence. Although no data is available on frequency distributions of different word orders, we made a frequency count of relative pronouns with different case markers available in the Hungarian Webcorpus (Hálacsy et al, 2004; Kornai et al, 2006), showing that the nominative form ‘aki’ is by an order of magnitude more frequent than the accusative form ‘akit’; while frequency differences are smaller for relative pronouns for non-human referents, occurances of nominative forms still dominate (see Table 1). …”
Section: Processing Theories and Their Application To Hungarianmentioning
confidence: 99%
“…For a full description of the database, see Halácsy et al (2004) and Kornai et al (2006). We included categories that were not associated with each other (either semantically or phonetically) and that were themselves of moderate frequency.…”
Section: Methodsmentioning
confidence: 99%