In this paper, we present a tangible outcome of the TextLink network: a joint online database project displaying and linking existing and newly-created lexicons of discourse connectives in multiple languages. We discuss the definition and demarcation of the class of connectives that should be included in such a resource, and present the syntactic, semantic/pragmatic, and lexicographic information we collected. Further, the technical implementation of the database and the search functionality are presented. We discuss how the multilingual integration of several connective lexicons provides added value for linguistic researchers and other users interested in connectives, by allowing crosslinguistic comparison and a direct linking between discourse relational devices in different languages. Finally, we provide pointers for possible future extensions both in breadth (i.e., by adding lexicons for additional languages) and depth (by extending the information provided for each connective item and by strengthening the crosslinguistic links).
In this article, we present COPLE2, a new corpus of Portuguese FL/L2, which encompasses written and spoken data produced by foreign learners of Portuguese at the University of Lisbon. Over the past few years we are seeing a substantial growth in the area of learner corpus research applied to other languages besides English. Our aim is to enhance the learning data of Portuguese, a less commonly taught language. We believe that COPLE2 will constitute a good resource for teachers and researchers, since it will provide empirical data to: (i) identify general errors in the learning of Portuguese L2; (ii) develop textbooks and other teaching material targeting specific groups of students; (iii) implement teacher training material by taking into account the analysis of the corrections of the teachers. We will briefly describe the work in progress regarding the constitution and linguistic annotation of this corpus.
Keywords/Palavras-chaveCorpus de aprendizagem, ensino e aquisição de português LE/L2, anotação do erro. Learner corpus, second language acquisition, foreign language teaching, error annotation.
Este artigo tem por objetivo apresentar uma comparação entre as formas de tratamento usadas no português europeu e brasileiro. Mostramos que naquela variedade o sistema é bem mais complexo devido (i) a uma distribuição complementar entre tu e você segundo o tipo de relação entre os interlocutores; (ii) a uma variedade maior de formas nominais para o tratamento entreíntimos e não íntimos. Quanto ao português brasileiro, os dois pronomes não se encontram em distribuição complementar. O uso de você é atestado numa grande área central do país, enquanto em outras regiões tu e você convivem, com o predomínio de uma ou outra forma, que, em geral, são usadas como variantes.
Abstract. We present in this paper an experiment in automatically tagging a set of Portuguese modal verbs with modal information. Modality is the expression of the speaker's (or the subject's) attitude towards the content of the sentences and may be marked with lexical clues such as verbs, adverbs, adjectives, but also by mood and tense. Here we focus exclusively on 9 verbal clues that are frequent in Portuguese and that may have more than one modal meaning. We use as our gold data set a corpus of 160.000 tokens manually annotated, according to a modality annotation scheme for Portuguese. We apply a machine learning approach to predict the modal meaning of a verb in context. This modality tagger takes into consideration all the features available from the parsed data (pos, syntactic and semantic). The results show that the tagger improved the baseline for all verbs, and reached macro-average F-measures between 35 and 81% depending on the modal verb and on the modal value.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.