A deep learning classifier for sentence classification in biomedical and computer science abstracts

Gonçalves, Sérgio; Cortez, Paulo; Moro, Sérgio

doi:10.1007/s00521-019-04334-2

Cited by 33 publications

(20 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Then, we apply the non-parametric Wilcoxon test for measuring statistical significance [62]. To compare the different classifiers, we use the popular area under the curve (AUC) of the receiver operating characteristic (ROC) curve [45,63,64], computed on the rolling window test data. The ROC curve shows the performance of a classifier for a target class and across all decision threshold values (T TFD and T FUR ), plotting the False Positive Rate (FPR), in x-axis, versus the True Positive Rate (TPR), in the y-axis.…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Twitter alloy steel disambiguation and user relevance via one-class and two-class news titles classifiers

Zola

Cortez

Brentari

2020

Neural Comput & Applic

Self Cite

View full text Add to dashboard Cite

This paper addresses the nontrivial task of Twitter financial disambiguation (TFD), which is relevant to filter financial domain tweets (e.g., alloy steel or coffee prices) when no unique identifiers (e.g., cashtags) are adopted. To automate TFD, we propose a transfer learning approach that uses freely labeled news titles to train diverse one-class and two-class classification methods. These include different text handling transforms, adaptations of statistical measures and modern machine learning methods, including support vector machines (SVM), deep autoencoders and multilayer perceptrons. As a case study, we analyzed the domain of alloy steel prices, collecting a recent Twitter dataset. Overall, the best results were achieved by a two-class SVM fed with TFD statistical measures and topic model features, obtaining an 80% and 71% discrimination level when tested with 11,081 and 3,000 manually labeled tweets. The best one-class performance (78% and 69% for the same test tweets) was obtained by a term frequency-inverse document frequency classifier (TF-IDFC). These models were further used to generate a Financial User Relevance rank (FUR) score, aiming to filter relevant users. The SVM and TF-IDFC FUR models obtained a predictive user discrimination level of 80% and 75% when tested with a manually labeled test sample of 418 users. These results confirm the proposed joint TFD-FUR approach as a valuable tool for the selection of Twitter texts and users for financial social media analytics (e.g., sentiment analysis, detection of influential users).

show abstract

Section: Discussionmentioning

confidence: 99%

“…The AU C = ROCdT measures the global discriminatory performance of a classifier. Often, the AUC values are interpreted as [64]: 0.5equal to a random classifier; 0.6 -reasonable, 0.7 -good; 0.8 -very good; 0.9excellent; and 1 -perfect. The ROC curve analysis contains two main advantages to evaluate classifiers [63].…”

Section: Discussionmentioning

confidence: 99%

Twitter alloy steel disambiguation and user relevance via one-class and two-class news titles classifiers

Zola

Cortez

Brentari

2020

Neural Comput & Applic

Self Cite

View full text Add to dashboard Cite

show abstract

“…In the last number of scholarly works, there have been remarkable developments in deep learning [ 12 , 13 ]. Architectures such as a convolutional neural network (CNN), LSTM, and GRU have obtained competitive results in several competitions (e.g., computer vision, signal, and natural language processing) [ 14 ]. The Long Short-Term Memory networks “LSTMs”, introduced in [ 15 ], are a special kind of RNN, capable of learning long-term dependencies.…”

Section: Methodsmentioning

confidence: 99%

Algerian Dialect Translation Applied on COVID-19 Social Media Comments

Slim

Melouah

Faghihi

et al. 2020

Artificial Intelligence and Renewables Towards an Energy Transition

View full text Add to dashboard Cite

This work is part of a study on the propagation of misinformation about COVID-19 and its impact on Algerian society. It studies the problem of Algerian dialect translation applied to COVID-19 social media communications. The proposed system begins by filtering messages to identify comments that talk about COVID-19. Then, COVID-19 texts are translated from the Algerian dialect to formal standard Arabic. The filtering process is based on the long short-term memory (LSTM) model. The translation process is based on the embedding-GRU model. Experimental results give precision rates of about 99.98% in the filtering process and about 97.56% in the translation process. The achieved BLUE score is 22.10.

show abstract

“…However, such a design assumes that the class of the current sentence is conditionally independent of the classes of the future sentences and the previous n (n ≥ 2) sentences given the class of the previous sentence. The other model, Word-BiGRU [10] employs word embeddings within the same sentence to generate sentence embeddings, which are further used to label the class of the sentence. The Word-BiGRU model utilizes convolution layers with filter sizes of 5 to integrate the words within a sentence.…”

Section: Abstract Sentence Classificationmentioning

confidence: 99%

“…The fifth model, Word-BiGRU [10], also generates each sentence embedding by integrating the word embeddings within each sentence. This model incorporates the relationship among different sentences by a bidirectional gated recurrent unit (GRU).…”

Section: Baseline Modelsmentioning

confidence: 99%

Toward Building an Academic Search Engine Understanding the Purposes of the Matched Sentences in an Abstract

et al. 2021

View full text Add to dashboard Cite

This paper introduces an automatic approach to understand the purposes of each sentence in the abstract of an academic document. Specifically, computers can label each sentence in the abstract as being related to one or several of six aspects -"BACKGROUND", "OBJECTIVES", "METHODS", "RESULTS", "CONCLUSIONS", and "OTHERS". Experimental results obtained on a real dataset show that the labeling methodology outperforms baseline methods. We also build a prototype academic search engine to demonstrate the use of this new design. Users may search for articles containing keywords related to any of these six aspects to better meet their search goals.

show abstract

A deep learning classifier for sentence classification in biomedical and computer science abstracts

Cited by 33 publications

References 40 publications

Twitter alloy steel disambiguation and user relevance via one-class and two-class news titles classifiers

Twitter alloy steel disambiguation and user relevance via one-class and two-class news titles classifiers

Algerian Dialect Translation Applied on COVID-19 Social Media Comments

Toward Building an Academic Search Engine Understanding the Purposes of the Matched Sentences in an Abstract

Contact Info

Product

Resources

About