Performance Comparison of Popular Text Vectorising Models on Multi-class Email Classification

Kulkarni, Ritwik; Vintró, Mercè; Kapetanakis, Stelios; Sama, Michele

doi:10.1007/978-3-030-01054-6_41

Cited by 2 publications

(2 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Given that frequent manual curation of large sections of the database is not only time‐consuming but also resource‐intensive, we developed an automatic way of pruning out irrelevant articles using a neural network. Neural networks have become quite powerful at classifying text (Kowsari et al., 2019; Kulkarni et al., 2018) in the past decade. The following sections describe the procedure of building, training and testing a neural network to perform a classification task on the collected articles.…”

Section: Materials and Methods: Pipelinementioning

confidence: 99%

“…Given that frequent manual curation of large sections of the database is not only time consuming but also resource intensive, we developed an automatic way of pruning out irrelevant articles using a neural network. Neural networks have become quite powerful at classifying text [Kowsari et al, 2019]; [Kulkarni et al, 2018] in the past decade.…”

Section: Neural Network Classifiermentioning

confidence: 99%

See 1 more Smart Citation

Automated retrieval of information on threatened species from online sources using machine learning

Kulkarni

Minin

2021

Methods Ecol Evol

Self Cite

View full text Add to dashboard Cite

As resources for conservation are limited, gathering and analyzing information from digital platforms can help investigate the global biodiversity crisis in a cost-efficient manner. Development and application of methods for automated content analysis of digital data sources are especially important in the context of investigating human-nature interactions.2. In this study, we introduce novel application methods to automatically collect and analyze textual data on species of conservation concern from digital platforms. An end to end pipeline is constructed that begins from searching and downloading news articles about species listed in Appendix I of the Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES) along with news articles from specific Twitter handles and proceeds with implementing natural language processing and machine learning methods to filter and retain only relevant articles. A crucial aspect here is the automatic annotation of training data, which can be challenging in many machine learning applications. A Named Entity Recognition model is then used to extract additional relevant information for each article.3. The data collected over a one month period included 15,088 articles focusing on 585 species listed in Appendix I of CITES. The accuracy of the neural network to detect relevant articles was 95.91% while the Named Entity recognition model helped extract information on prices, location, and quantities of traded animals and plants. A regularly updated database, which can be queried and analysed for various research purposes and to inform conservation decision-making, is generated by the system. 4. The results demonstrate that natural language processing can be used successfully to extract information from digital text content. The proposed methods can be applied to multiple digital data platforms at the same time and used to investigate human-nature interactions in conservation science and practice.

show abstract

Section: Materials and Methods: Pipelinementioning

confidence: 99%

Section: Neural Network Classifiermentioning

confidence: 99%