Relevancer: Finding and Labeling Relevant Information in Tweet Collections

Hürriyetoğlu, Ali; Gudehus, Christian; Oostdijk, N.H.J.; Bosch, Antal van den

doi:10.1007/978-3-319-47874-6_15

Cited by 5 publications

(3 citation statements)

References 7 publications

(8 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We could extend the training set with cluster mining using Relevancer (Hürriyetoglu et al, 2016), use the rule-based system to extend the training set (Hürriyetoglu, 2019), or use the rule-based system to generate fine-grained data that can be used in a multi-task setting.…”

Section: Discussionmentioning

confidence: 99%

COVCOR20 at WNUT-2020 Task 2: An Attempt to Combine Deep Learning and Expert rules

Hürriyetoğlu

Safaya

Oostdijk

et al. 2020

Proceedings of the Sixth Workshop on Noisy User-Generated Text (W-Nut 2020)

Self Cite

View full text Add to dashboard Cite

In the scope of WNUT-2020 Task 2, we developed various text classification systems, using deep learning models and one using linguistically informed rules. While both of the deep learning systems outperformed the system using the linguistically informed rules, we found that through the integration of (the output of) the three systems a better performance could be achieved than the standalone performance of each approach in a cross-validation setting. However, on the test data the performance of the integration was slightly lower than our best performing deep learning model. These results hardly indicate any progress in line of integrating machine learning and expert rules driven systems. We expect that the release of the annotation manuals and gold labels of the test data after this workshop will shed light on these perplexing results.

show abstract

Section: Discussionmentioning

confidence: 99%

COVCOR20 at WNUT-2020 Task 2: An Attempt to Combine Deep Learning and Expert rules

Hürriyetoğlu

Safaya

Oostdijk

et al. 2020

Proceedings of the Sixth Workshop on Noisy User-Generated Text (W-Nut 2020)

Self Cite

View full text Add to dashboard Cite

show abstract

“…We could extend the training set with cluster mining using Relevancer (Hürriyetoglu et al, 2016), use the rule-based system to extend the training set (Hürriyetoglu, 2019), or use the rulebased system to generate fine-grained data that can be used in a multi-task setting.…”

Section: Discussionmentioning

confidence: 99%

COVCOR20 at WNUT-2020 Task 2: An Attempt to Combine Deep Learning and Expert rules

Hürriyetoğlu¹,

Safaya²,

Oostdijk³

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…For the information retrieval process positively labeled documents in a dataset are important and should not be missed, therefore achieving high recall is extremely important. However, there is generally a large number of documents that are relevant or not to the concerned topic and doing close reading for all documents and annotating them requires lots of time and resources (Hürriyetoglu et al, 2016;Hürriyetoǧlu et al, 2017). Therefore, ranking documents according to relevance to the investigated class may help to reduce close reading time and decrease the likelihood of missing critical information.…”

Section: Introductionmentioning

confidence: 99%

Zero-Shot Ranking Socio-Political Texts with Transformer Language Models to Reduce Close Reading Time

Kiymet¹,

Hürriyetoğlu²

2022

Preprint

View full text Add to dashboard Cite

We approach the classification problem as an entailment problem and apply zero-shot ranking to socio-political texts. Documents that are ranked at the top can be considered positively classified documents and this reduces the close reading time for the information extraction process. We use Transformer Language Models to get the entailment probabilities and investigate different types of queries. We find that DeBERTa achieves higher mean average precision scores than RoBERTa and when declarative form of the class label is used as a query, it outperforms dictionary definition of the class label. We show that one can reduce the close reading time by taking some percentage of the ranked documents that the percentage depends on how much recall they want to achieve. However, our findings also show that percentage of the documents that should be read increases as the topic gets broader.

show abstract

Relevancer: Finding and Labeling Relevant Information in Tweet Collections

Cited by 5 publications

References 7 publications

COVCOR20 at WNUT-2020 Task 2: An Attempt to Combine Deep Learning and Expert rules

COVCOR20 at WNUT-2020 Task 2: An Attempt to Combine Deep Learning and Expert rules

COVCOR20 at WNUT-2020 Task 2: An Attempt to Combine Deep Learning and Expert rules

Zero-Shot Ranking Socio-Political Texts with Transformer Language Models to Reduce Close Reading Time

Contact Info

Product

Resources

About