Nenavaden par: pogostost besede v korpusu in pri uporabniških poizvedbah

Trap-Jensen, Lars; Lorentzen, Henrik; Sørensen, Nicolai Hartvig

doi:10.4312/slo2.0.2014.2.94-113

Cited by 5 publications

(1 citation statement)

References 3 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The dictionary in question is a big general one with more than hundred thousand lemmata and the conclusion may therefore not be representative for dictionaries with a more reduced lemma stock as the ones to which Kilgarriff (2013) refers. However, research into logfiles by other scholars confirms another of Bergenholtz and Norddahl's (2012) conclusions, namely that there is a certain, and therefore lexicographical relevant, discrepancy between the most frequent words in a corpus and the words most frequently looked up in dictionaries; see De Schryver et al 2006and Trap-Jensen et al (2014). This last conclusion implies that it would be better to start a lexicographical project with a reduced lemma stock with lemmata selected from logfiles instead of a corpus, and then use the method recommended by Bergenholtz and Johnsen (2005) and De Schryver (2013), among others, to supplement the lemma list with additional lemmata that appear in the logfiles once the dictionary has been published online.…”

Section: Empirical Basismentioning

confidence: 93%

New Insights in the Design and Compilation of Digital Bilingual Lexicographical Products: The Case of the Diccionarios Valladolid-UVa

Fuertes-Olivera

Tarp

Sepstrup³

2018

LEXI

View full text Add to dashboard Cite

This contribution deals with a new digital English-Spanish-English lexicographical project that started as an assignment from the Danish high-tech company Ordbogen A/S which signed a contract with the University of Valladolid (Spain) for designing and compiling a digital lexicographical product that is economically and commercially feasible and can be used for various purposes in connection with its expansion into new markets and the launching of new tools and services which make use of lexicographical data. The article presents the philosophy underpinning the project, highlights some of the innovations introduced, e.g. the use of logfiles for compiling the initial lemma list and the order of compilation, and illustrates a compilation methodology which starts by assuming the relevance of new concepts, i.e. object and auxiliary languages instead of target and source languages. The contribution also defends the premise that the future of e-lexicography basically rests on a close cooperation between research centers and high-tech companies which assures the adequate use of disruptive technologies and innovations.

show abstract

Section: Empirical Basismentioning

confidence: 93%

New Insights in the Design and Compilation of Digital Bilingual Lexicographical Products: The Case of the Diccionarios Valladolid-UVa

Fuertes-Olivera

Tarp

Sepstrup³

2018

LEXI

View full text Add to dashboard Cite

show abstract

Language-Internal Neologisms and Anglicisms: Dealing with New Words and Expressions in The Danish Dictionary

Trap-Jensen¹

2020

Dictionaries

View full text Add to dashboard Cite

The Relationship Between Dictionary Look-up Frequency and Corpus Frequency Revisited: A Log-File Analysis of a Decade of User Interaction with a Swahili-English Dictionary

Schryver

Wolfer

Lew

2019

gema

View full text Add to dashboard Cite

In an earlier publication it was claimed that there is no useful relationship between Swahili-English dictionary look-up frequencies and the occurrence frequencies for the same wordforms in Swahili-English corpora, at least not beyond the top few thousand wordforms. This result was challenged using data for German by a different team of researchers using an improved methodology. In the present article the original Swahili-English data is revisited, using ten years' worth of it rather than just two, and using the improved methodology. We conclude that there is indeed a positive relationship. In addition, we show that online dictionary look-up behaviour is remarkably similar across languages, even when, as in our case, one is dealing with languages from very dissimilar language families. Furthermore, online dictionaries turn out to have minimum look-up success rates, below which they simply cannot go. These minima are language-sensitive and vary depending on the regularity of the searched-for entries, but are otherwise constant no matter the size of randomly sampled dictionaries. Corpus-informed sampling always improves on any random method. Lastly, from the point of view of the graphical user interface, we argue that the average user of an online bilingual dictionary is better served with a single search box, rather than separate search boxes for each dictionary side.

show abstract

Nenavaden par: pogostost besede v korpusu in pri uporabniških poizvedbah

Cited by 5 publications

References 3 publications

New Insights in the Design and Compilation of Digital Bilingual Lexicographical Products: The Case of the Diccionarios Valladolid-UVa

New Insights in the Design and Compilation of Digital Bilingual Lexicographical Products: The Case of the Diccionarios Valladolid-UVa

Language-Internal Neologisms and Anglicisms: Dealing with New Words and Expressions in The Danish Dictionary

The Relationship Between Dictionary Look-up Frequency and Corpus Frequency Revisited: A Log-File Analysis of a Decade of User Interaction with a Swahili-English Dictionary

Contact Info

Product

Resources

About