Ildikó Pilán scite author profile

Ildikó Pilán

5Publications

97Citation Statements Received

92Citation Statements Given

How they've been cited

110

How they cite others

118

Affiliations

Norwegian Computing Center, University of Gothenburg, University of Oslo

Publications

Order By: Most citations

Anonymisation Models for Text Data: State of the art, Challenges and Future Directions

Lison¹,

Pilán²,

Sánchez³

et al. 2021

View full text Add to dashboard Cite

This position paper investigates the problem of automated text anonymisation, which is a prerequisite for secure sharing of documents containing sensitive information about individuals. We summarise the key concepts behind text anonymisation and provide a review of current approaches. Anonymisation methods have so far been developed in two fields with little mutual interaction, namely natural language processing and privacy-preserving data publishing. Based on a case study, we outline the benefits and limitations of these approaches and discuss a number of open challenges, such as (1) how to account for multiple types of semantic inferences, (2) how to strike a balance between disclosure risk and data utility and (3) how to evaluate the quality of the resulting anonymisation. We lay out a case for moving beyond sequence labelling models and incorporate explicit measures of disclosure risk into the text anonymisation process.

show abstract

Rule-based and machine learning approaches for second language sentence-level readability

Pilán¹,

Volodina

Johansson

2014

View full text Add to dashboard Cite

We present approaches for the identification of sentences understandable by second language learners of Swedish, which can be used in automatically generated exercises based on corpora. In this work we merged methods and knowledge from machine learning-based readability research, from rule-based studies of Good Dictionary Examples and from second language learning syllabuses. The proposed selection methods have also been implemented as a module in a free web-based language learning platform. Users can use different parameters and linguistic filters to personalize their sentence search with or without a machine learning component assessing readability. The sentences selected have already found practical use as multiple-choice exercise items within the same platform. Out of a number of deep linguistic indicators explored, we found mainly lexical-morphological and semantic features informative for second language sentence-level readability. We obtained a readability classification accuracy result of 71%, which approaches the performance of other models used in similar tasks. Furthermore, during an empirical evaluation with teachers and students, about seven out of ten sentences selected were considered understandable, the rulebased approach slightly outperforming the method incorporating the machine learning model.

show abstract

The SweLL Language Learner Corpus

Volodina

Granstedt

Matsson

et al. 2019

NEJLT

View full text Add to dashboard Cite

The article presents a new language learner corpus for Swedish, SweLL, and the methodology from collection and pesudonymisation to protect personal information of learners to annotation adapted to second language learning. The main aim is to deliver a well-annotated corpus of essays written by second language learners of Swedish and make it available for research through a browsable environment. To that end, a new annotation tool and a new project management tool have been implemented, – both with the main purpose to ensure reliability and quality of the final corpus. In the article we discuss reasoning behind metadata selection, principles of gold corpus compilation and argue for separation of normalization from correction annotation.

show abstract

The Image of the Monolingual Dictionary Across Europe. Results of the European Survey of Dictionary use and Culture

Kosem

Lew

Müller-Spitzer

et al. 2018

View full text Add to dashboard Cite

The article presents the results of a survey on dictionary use in Europe, focusing on general monolingual dictionaries. The survey is the broadest survey of dictionary use to date, covering close to 10,000 dictionary users (and non-users) in nearly thirty countries. Our survey covers varied user groups, going beyond the students and translators who have tended to dominate such studies thus far. The survey was delivered via an online survey platform, in language versions specific to each target country. It was completed by 9,562 respondents, over 300 respondents per country on average. The survey consisted of the general section, which was translated and presented to all participants, as well as country-specific sections for a subset of 11 countries, which were drafted by collaborators at the national level. The present report covers the general section. IntroductionResearch into dictionary use has become increasingly important in recent years. In contrast to 15 years ago, new findings in this area are presented every year, e.g. at every Euralex or eLex conference. These studies range from questionnaire or log file studies to smaller-scale studies focussing on eye tracking, usability, or other aspects of dictionary use measurable in a lab. For an overview of different studies,

show abstract

SB@GU at the Complex Word Identification 2018 Shared Task

Alfter¹,

Pilán²

2018

View full text Add to dashboard Cite

In this paper, we describe our experiments for the Shared Task on Complex Word Identification (CWI) 2018 (Yimam et al., 2018), hosted by the 13 th Workshop on Innovative Use of NLP for Building Educational Applications (BEA) at NAACL 2018. Our system for English builds on previous work for Swedish concerning the classification of words into proficiency levels. We investigate different features for English and compare their usefulness using feature selection methods. For the German, Spanish and French data we use simple systems based on character n-gram models and show that sometimes simple models achieve comparable results to fully featureengineered systems.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ildikó Pilán

Anonymisation Models for Text Data: State of the art, Challenges and Future Directions

Rule-based and machine learning approaches for second language sentence-level readability

The SweLL Language Learner Corpus

The Image of the Monolingual Dictionary Across Europe. Results of the European Survey of Dictionary use and Culture

SB@GU at the Complex Word Identification 2018 Shared Task

Contact Info

Product

Resources

About