2021
DOI: 10.37936/ecti-cit.2021151.228621
|View full text |Cite
|
Sign up to set email alerts
|

Information Extraction Tasks based on BERT and SpaCy on Tourism Domain

Abstract: In this paper, we present two methodologies to extract particular information based on the full text returned from the search engine to facilitate the users. The approaches are based three tasks: name entity recognition (NER), text classification and text summarization. The first step is the building training data and data cleansing. We consider tourism domain such as restaurant, hotels, shopping and tourism data set crawling from the websites. First, the tourism data are gathered and the vocabularies are built.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 16 publications
(13 citation statements)
references
References 10 publications
0
9
0
Order By: Relevance
“…spaCy is a free, open-source library for advanced Natural Language Processing (NLP) in Python [36,37]. spaCy supports over 64 language models trained on either newspaper articles, media, blogs, comments etc.…”
Section: Spacymentioning
confidence: 99%
See 2 more Smart Citations
“…spaCy is a free, open-source library for advanced Natural Language Processing (NLP) in Python [36,37]. spaCy supports over 64 language models trained on either newspaper articles, media, blogs, comments etc.…”
Section: Spacymentioning
confidence: 99%
“…For example, in the sentence "I have to Google why Google has a lot of employees", the model would correctly recognize the first Google as a verb and the latter as an organisation entity, despite being constructed in a similar manner. Because of the models' reliance on sentence syntax for entity extraction [37], it is able to extract entities regardless of spelling errors or presence of noisy elements within the entity. In our previous study [16] we showed that the syntax reliance of spaCy gives it a much higher recall as compared to the gazetteer based DBpedia tool.…”
Section: Spacymentioning
confidence: 99%
See 1 more Smart Citation
“…Specifically, empirical results shown by transformers dramatically outperform the classical pipeline of machine learning models with a bag-of-words representation of the most common and relevant words of the texts according to algorithms such as TF-IDF [16]. Driven by the amazing results of transformer models, several communities like finance [37], energy [8] or tourism [9] are starting to use them for a wide variety of tasks.…”
Section: Fundamentals Of Large Language Models For Supervised Classif...mentioning
confidence: 99%
“…For example, in tourism domain, NER is used to label the point of interest, reviews, hotels, location, etc. (Chantrapornchai and Tunsakul, 2021); in medical domain, NER identifies clinical entities in electronic medical records and assigns them to previously defined categories, such as disease, image review, laboratory examination, operation, drug and anatomy (Kong et al , 2021); in commerce domain, NER is introduced to identify entity name of the cross border e-commerce commodity (Luo et al , 2020); in architecture domain, NER also successfully recognizes bridge names, structural members, member elements, locations of members or elements, structural defects and negative descriptions in bridge inspection reports (Li et al , 2021).…”
Section: Related Workmentioning
confidence: 99%