A exploração de corpora para a extração de léxico de especialidade é um método consensual e comum na construção de recursos lexicais. No entanto, as metodologias empregadas não são explicitamente discutidas, dificultando a comparação e a determinação de abordagens robustas. Para preencher essa lacuna, neste artigo apresentamos e discutimos uma metodologia detalhada para extração de léxico de especialidade a partir de corpora, conjugando abordagens linguísticas e estatísticas. O método proposto prevê tanto o uso de corpora de especialidade como de corpora monitores e inclui: i) análise de dados de frequência; ii) extração de concordâncias e colocações; iii) extração de informação de ordem textual, permitindo a extração de unidades lexicais atómicas e multipalavra e de relações semânticas relevantes. Desse modo, o objetivo da metodologia é a determinação de listas de potenciais unidades lexicais de especialidade e de informações relevantes para a sua descrição que permitam uma validação final rápida e eficiente, maximizando o valor informacional da interação com os especialistas.
This paper introduces the CORPORART, a bilingual corpus of Public Art. CORPORART intends to gather, in a single collection of bilingual data, representative samples of specialized language in European Portuguese and Italian. The compilation of this corpus is part of an ongoing doctoral project, which aims to integrate specialized lexical units into a pre-existing common language resource, WordNet.PT (Marrafa et al., 2005), in the perspective of contributing to streamline communication between heterogeneous interlocutors (Amaro & Mendes, 2012). Assuming that the structure of the corpus heavily depends on the goals of the investigation, this paper presents the linguistic and extralinguistic parameters adopted for the construction and organization of the corpus, as well as the criteria for text processing. In particular, we will deepen the notion of representativity and comparability considering the specificity of this case study, outlining a work practice proposal oriented to guarantee these two flexible dimensions within the specialized languages context.
This chapter presents research on the teaching-learning of Portuguese as a host language, based on the exploration of authentic informational and institutional texts targeting migrant and refugee people, and considering that successful host language teaching must correspond to the needs of its target audience. The chapter discusses methods of defining and identifying criteria and features to monitor official texts with regard to inclusiveness and bias. It provides insights on how to select real texts to be used in task-based language teaching approaches for inclusive host language teaching. Departing from a real corpus analysis, the potential and the limitations of existing guidelines to inclusiveness for the assessment of real texts are shown, as well as other still neglected issues. Furthermore, this chapter provides future research directions to an effective and reliable assessment of inclusive texts that can serve as inclusive host language teaching materials through NLP and machine learning approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.