E-Hitz: A word frequency list and a program for deriving psycholinguistic statistics in an agglutinative language (Basque)

Perea, Manuel; Urkia, Miriam; Davis, Colin J.; Agirre, Ainhoa; Laseka, Edurne; Carreiras, Manuel

doi:10.3758/bf03193893

Cited by 70 publications

(66 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The 'apertium' translation database (Tyers, Sánchez-Martínez, Ortiz-Rojas, & Forcada, 2010) was used to define an initial set of Basque-Spanish noun translations. Next, using the citation-form phonological transcription in lexical databases for Spanish ('B-Pal', Davis & Perea, 2005) and Basque ('E-Hitz', Perea et al, 2006), pairs of nouns were chosen such that: the members of each pair had the same number of syllables but distinct initial phonemes, the Levenshtein distance between the CV transcriptions of each pair was less than three, the Levenshtein distance between the phonological transcriptions was greater than three (to avoid cognates), the absolute difference in log 10 frequency was less than 2, both the Spanish and Basque frequencies-per-million were greater than 5, both the Spanish and the Basque words had a noun part-of-speech tag, and that the absolute difference in the number of phonemes was not greater than 1. Note that the lexical databases for Spanish and Basque are compiled from written sources.…”

Section: Materials and Designmentioning

confidence: 99%

Modeling accuracy as a function of response time with the generalized linear mixed effects model

Davidson¹,

Martin²

2013

Acta Psychologica

View full text Add to dashboard Cite

a b s t r a c t a r t i c l e i n f oIn psycholinguistic studies using error rates as a response measure, response times (RT) are most often analyzed independently of the error rate, although it is widely recognized that they are related. In this paper we present a mixed effects logistic regression model for the error rate that uses RT as a trial-level fixed-and random-effect regression input. Production data from a translation-recall experiment are analyzed as an example. Several model comparisons reveal that RT improves the fit of the regression model for the error rate. Two simulation studies then show how the mixed effects regression model can identify individual participants for whom (a) faster responses are more accurate, (b) faster responses are less accurate, or (c) there is no relation between speed and accuracy. These results show that this type of model can serve as a useful adjunct to traditional techniques, allowing psycholinguistic researchers to examine more closely the relationship between RT and accuracy in individual subjects and better account for the variability which may be present, as well as a preliminary step to more advanced RT-accuracy modeling.

show abstract

Section: Materials and Designmentioning

confidence: 99%

Modeling accuracy as a function of response time with the generalized linear mixed effects model

Davidson¹,

Martin²

2013

Acta Psychologica

View full text Add to dashboard Cite

show abstract

“…Standalone software packages are also available for Spanish and other languages that provide subsets of the properties in EsPal (Davis, 2005;Davis & Perea, 2005;New, Pallier, Brysbaert, & Ferrand, 2004;Perea et al, 2006). However, given the size of the corpora (discussed below), some of the calculations for some of the properties take up to a week on a standard PC, so a precomputed set of properties is preferred.…”

mentioning

confidence: 99%

EsPal: One-stop shopping for Spanish word properties

et al. 2013

Self Cite

View full text Add to dashboard Cite

This article introduces EsPal: a Web-accessible repository containing a comprehensive set of properties of Spanish words. EsPal is based on an extensible set of data sources, beginning with a 300 million token written database and a 460 million token subtitle database. Properties available include word frequency, orthographic structure and neighborhoods, phonological structure and neighborhoods, and subjective ratings such as imageability. Subword structure properties are also available in terms of bigrams and trigrams, biphones, and bisyllables. Lemma and part-of-speech information and their corresponding frequencies are also indexed. The website enables users either to upload a set of words to receive their properties or to receive a set of words matching constraints on the properties. The properties themselves are easily extensible and will be added over time as they become available. It is freely available from the following website: http:// www.bcbl.eu/databases/espal/. Keywords Word frequency . Subtitles . Word recognition . Corpus linguistics . PsycholinguisticsResearchers from a wide range of disciplines (e.g., neuroscience, artificial intelligence, psychology, linguistics, and education, among others) who work in the interdisciplinary area of language research (e.g., language acquisition, language processing, language learning, bilingualism, and computational linguistics) need quick and efficient access to information about specific properties of words. For example, word frequency is a dominant factor in accounting for visual word recognition speed as measured by lexical decision times (Forster & Chambers, 1973;Monsell, 1991) and eye fixation durations during reading (Rayner, 2009). Unsurprisingly, reading behavior as measured by, for example, lexical decision, naming, fixation times, and so on is affected by a wide range of other properties of words, including orthographic neighborhood

show abstract

“…Such a result would demonstrate that orthographic markedness modulates bilingual lexical access, suggesting that lexical-orthographic representations of words from two languages that share basic sub-lexical orthographic distributional information are stored closer in lexical semantic memory than representations of words with highly distinctive language-selective orthotactics. Perea et al, 2006). We used the length-corrected orthographic Levenshtein distance in order to restrict the cross-linguistic similarity between these words and their Spanish translation equivalents.…”

mentioning

confidence: 99%

“…Further research should explore not only the way in which sub-lexically marked and unmarked words are represented in the bilingual and from the LEXESP database (Sebastián-Gallés et al, 2000) for Spanish. Bigram Frequency corresponds to the logarithmic transformation of bigram frequencies dependent of word length and position taken from E-Hitz database (Perea et al, 2006) and B-Pal (Davis & Perea, 2005). Asterisks indicate significant statistical differences between Marked and Unmarked conditions.…”

mentioning

confidence: 99%

Lexical organization of language-ambiguous and language-specific words in bilinguals

Casaponsa

Duñabeitia

2016

Quarterly Journal of Experimental Psychology

View full text Add to dashboard Cite

Previous research has shown the importance of sublexical orthographic cues in determining the language of a given word when the two languages of a bilingual reader share the same script. In this study, we explored the extent to which cross-language sublexical characteristics of words—measured in terms of bigram frequencies—constrain selective language activation during reading. In Experiment 1, we investigated the impact of language-nonspecific and language-specific orthography in letter detection using the Reicher–Wheeler paradigm in a seemingly monolingual experimental context. In Experiment 2, we used the masked translation priming paradigm in order to better characterize the role of sublexical language cues during lexical access in bilinguals. Results show that bilinguals are highly sensitive to statistical orthographic regularities of their languages and that the absence of such cues promotes language-nonspecific lexical access, whereas their presence partially reduces parallel language activation. We conclude that language coactivation in bilinguals is highly modulated by sublexical processing and that orthographic regularities of the two languages of a bilingual are a determining factor in lexical access.

show abstract

E-Hitz: A word frequency list and a program for deriving psycholinguistic statistics in an agglutinative language (Basque)

Cited by 70 publications

References 29 publications

Modeling accuracy as a function of response time with the generalized linear mixed effects model

Modeling accuracy as a function of response time with the generalized linear mixed effects model

EsPal: One-stop shopping for Spanish word properties

Lexical organization of language-ambiguous and language-specific words in bilinguals

Contact Info

Product

Resources

About