The article presents the results of a survey on dictionary use in Europe, focusing on general monolingual dictionaries. The survey is the broadest survey of dictionary use to date, covering close to 10,000 dictionary users (and non-users) in nearly thirty countries. Our survey covers varied user groups, going beyond the students and translators who have tended to dominate such studies thus far. The survey was delivered via an online survey platform, in language versions specific to each target country. It was completed by 9,562 respondents, over 300 respondents per country on average. The survey consisted of the general section, which was translated and presented to all participants, as well as country-specific sections for a subset of 11 countries, which were drafted by collaborators at the national level. The present report covers the general section.
IntroductionResearch into dictionary use has become increasingly important in recent years. In contrast to 15 years ago, new findings in this area are presented every year, e.g. at every Euralex or eLex conference. These studies range from questionnaire or log file studies to smaller-scale studies focussing on eye tracking, usability, or other aspects of dictionary use measurable in a lab. For an overview of different studies,
Ülevaade. Artiklis keskendutakse tööriista Good Dictionary Example ehk GDEX (Kilgarriff jt 2008) eesti mooduli versiooni 1.4 loomisele. GDEX on tööriist, mis aitab sõnastiku näitelauseks sobivaid korpuslauseid automaatselt tuvastada. GDEX-i moodul on seni loodud inglise, sloveeni, hollandi, portugali, hispaania, jaapani ja eesti keele jaoks. Siinses artiklis seletatakse esmalt lahti tööriista üldised tööpõhimõtted. Seejärel keskendutakse näitelauseid tuvastavate parameetrite statistilisele analüüsile ja parameetrite väärtuste määramisele. Parameetrite väärtuste hindamisele ning eri moodulite võrdlusele toetudes pakutakse välja eesti mooduli uus versioon 1.4.*
Institute of the Estonian LanguageThis paper reports on an assessment task carried out among students of Tallinn University and the University of Tartu, who speak Estonian at B2-C1 proficiency level, and among lexicographers working at the Institute of the Estonian Language. The purpose of the task was to determine whether, according to the above two types of annotators, authentic and unedited corpus sentences would be suitable as example sentences for learners' dictionaries on B2-C1 level. The results of the assessment task were also to help evaluate the output of version 1.4 of the Estonian module of GDEX (GDEX 1.4) used to choose and display web sentences in the Institute's new language portal Sõnaveeb. GDEX (Good Dictionary Example) is a function of the corpus query system Sketch Engine, designed to find optimal example sentence candidates from large corpora.The results of the assessment task confirmed three hypotheses: 1) Before displaying authentic corpus sentences to end-users, a filtering of corpus sentences is necessary; 2) GDEX 1.4 can identify good example candidates from corpora and filter out inapropriate candidates; 3) example sentences compiled by lexicographers are suitable example sentences. Both types of annotators considered as many as 96% of the dictionary examples to be suitable example sentences and 85% of corpus sentences chosen as good examples by GDEX 1.4. Only 6% of the sentences that were discarded by GDEX 1.4 were considered as suitable, meaning that 94% of the bad candidates had been filtered out successfully. As for unfiltered corpus sentences, 60% of those were considered unsuitable. When asking for the annotators' reasons for considering a sentence unsuitable, the most common arguments were that the sentences include anaphora and hence need more context, or that the sentences are colloquial, too long or too short.
Despite the unquestionable academic interest on corpus-based approaches
to language education, the use of corpora by teachers in their everyday
practice is still not very widespread. One way to promote usage of corpora
in language teaching is by making pedagogically appropriate corpora,
labelled with different types of problems (for instance, sensitive content,
offensive language, structural problems), so that teachers can select
authentic examples according to their needs. Because manually labelling
corpora is extremely time-consuming, we propose to use crowdsourcing for
this task. After a first exploratory phase, we are currently developing a
multimode, multilanguage game in which players first identify problematic
sentences and then classify them.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.