2006
DOI: 10.3758/bf03193893
|View full text |Cite
|
Sign up to set email alerts
|

E-Hitz: A word frequency list and a program for deriving psycholinguistic statistics in an agglutinative language (Basque)

Abstract: Agglutinative languages (i.e., languages in which words are formed by joining morphemes together; e.g., Hungarian, Turkish, Basque) are an excellent testing ground for psycholinguistic research in some of the key issues in lexical access. At present, there are several useful databases for computing a number of relevant psycholinguistic statistics in nonagglutinative languages (for English, see Coltheart, 1981;Davis, 2005; for French, see New, Pallier, Brysbaert, & Ferrand, 2004; for Spanish, see Davis & Perea… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
66
0

Year Published

2009
2009
2016
2016

Publication Types

Select...
7
1

Relationship

3
5

Authors

Journals

citations
Cited by 70 publications
(66 citation statements)
references
References 29 publications
0
66
0
Order By: Relevance
“…The 'apertium' translation database (Tyers, Sánchez-Martínez, Ortiz-Rojas, & Forcada, 2010) was used to define an initial set of Basque-Spanish noun translations. Next, using the citation-form phonological transcription in lexical databases for Spanish ('B-Pal', Davis & Perea, 2005) and Basque ('E-Hitz', Perea et al, 2006), pairs of nouns were chosen such that: the members of each pair had the same number of syllables but distinct initial phonemes, the Levenshtein distance between the CV transcriptions of each pair was less than three, the Levenshtein distance between the phonological transcriptions was greater than three (to avoid cognates), the absolute difference in log 10 frequency was less than 2, both the Spanish and Basque frequencies-per-million were greater than 5, both the Spanish and the Basque words had a noun part-of-speech tag, and that the absolute difference in the number of phonemes was not greater than 1. Note that the lexical databases for Spanish and Basque are compiled from written sources.…”
Section: Materials and Designmentioning
confidence: 99%
“…The 'apertium' translation database (Tyers, Sánchez-Martínez, Ortiz-Rojas, & Forcada, 2010) was used to define an initial set of Basque-Spanish noun translations. Next, using the citation-form phonological transcription in lexical databases for Spanish ('B-Pal', Davis & Perea, 2005) and Basque ('E-Hitz', Perea et al, 2006), pairs of nouns were chosen such that: the members of each pair had the same number of syllables but distinct initial phonemes, the Levenshtein distance between the CV transcriptions of each pair was less than three, the Levenshtein distance between the phonological transcriptions was greater than three (to avoid cognates), the absolute difference in log 10 frequency was less than 2, both the Spanish and Basque frequencies-per-million were greater than 5, both the Spanish and the Basque words had a noun part-of-speech tag, and that the absolute difference in the number of phonemes was not greater than 1. Note that the lexical databases for Spanish and Basque are compiled from written sources.…”
Section: Materials and Designmentioning
confidence: 99%
“…Standalone software packages are also available for Spanish and other languages that provide subsets of the properties in EsPal (Davis, 2005;Davis & Perea, 2005;New, Pallier, Brysbaert, & Ferrand, 2004;Perea et al, 2006). However, given the size of the corpora (discussed below), some of the calculations for some of the properties take up to a week on a standard PC, so a precomputed set of properties is preferred.…”
mentioning
confidence: 99%
“…Such a result would demonstrate that orthographic markedness modulates bilingual lexical access, suggesting that lexical-orthographic representations of words from two languages that share basic sub-lexical orthographic distributional information are stored closer in lexical semantic memory than representations of words with highly distinctive language-selective orthotactics. Perea et al, 2006). We used the length-corrected orthographic Levenshtein distance in order to restrict the cross-linguistic similarity between these words and their Spanish translation equivalents.…”
mentioning
confidence: 99%
“…Further research should explore not only the way in which sub-lexically marked and unmarked words are represented in the bilingual and from the LEXESP database (Sebastián-Gallés et al, 2000) for Spanish. Bigram Frequency corresponds to the logarithmic transformation of bigram frequencies dependent of word length and position taken from E-Hitz database (Perea et al, 2006) and B-Pal (Davis & Perea, 2005). Asterisks indicate significant statistical differences between Marked and Unmarked conditions.…”
mentioning
confidence: 99%