Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names

Šandrih, Branislava; Krstev, Cvetana; Stanković, Ranka

doi:10.26615/978-954-452-056-4_122

Cited by 6 publications

(4 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The results of application of the methods and the tools used to label the terms in Serbian [24,25] are presented below, as well as the methods of supervised multi-class classification performed by the tool Weka 3.8.4 8 . These methods were applied to our primary annotated dataset expanded with additional attributes.…”

Section: Resultsmentioning

confidence: 99%

See 1 more Smart Citation

Automated labeling of terms in medical reports in Serbian

2020

Turk J Elec Eng & Comp Sci

View full text Add to dashboard Cite

Nowadays, many electronic health reports (EHRs) are stored daily. They consist of the structured part and of an unstructured section written in natural language. Due to the limited time for medical examination, EHRs are short reports which often contain errors and abbreviations. Therefore it is a challenge to process an EHR and extract knowledge from this part of the text for different purposes. This paper compares the results of three proposed methods for automatic labeling of medical terms in unstructured parts of EHRs. All words are categorized as words within the medical domain (symptoms, diagnoses, therapies, anatomy, specialties etc.) and those beyond the medical domain (numbers, places, stop words etc.). The first method is based on dictionaries of medical terms, the second on the training set, and the third on the training set and rules. The results of application of different methodologies to reduce a word to its basic form (pure, prefix, stem) are given for each of the methods. The paper shows that in labeling medical terms, the methods based on medical dictionaries (diagnosis, symptoms, medications etc.) do not produce best results, therefore it is better to use manually annotated part of the data set as a model. A significant number of words (17.36%) in medical reports are abbreviations and errors, so for better results, we should focus on creating rules to solve this problem. Better results are obtained for supervised methods compared to the dictionary-based method (with relative improvement of 42.82%). The inclusion of the algorithm for processing errors and abbreviations increased the results (with a relative improvement of 4.21%) and gave the largest F1-measure (0.9082). The advantage of the proposed method is that the use of rules for processing errors and abbreviations provides good results regardless of how the word is reduced to its basic form.

show abstract

Section: Resultsmentioning

confidence: 99%

“…A bottom-up approach of natural language processing based on taggers and machine learning methods applied to texts in Serbian is shown in papers [24,25]. Taggers can be used to classify terms into groups with different tags.…”

Section: Related Workmentioning

confidence: 99%

Automated labeling of terms in medical reports in Serbian

2020

Turk J Elec Eng & Comp Sci

View full text Add to dashboard Cite

show abstract

“…Takođe, od velikog značaja za našu problematiku su istraživanja koja su sproveli [11], zatim [10], kao i [12], rešavajući slične probleme u domenima sopstvenih jezika. Poseban značaj imaju studije koje su sačinili [13][8], jer su se autori bavili prepoznavanjem imenovanih entiteta u srpskom jeziku. U radu [6] prezentuju HuggingFace biblioteku i platformu, čija je glavna prednost mogućnost deljenja obučenih modela i skupova podataka široj javnosti.…”

Section: Prethodna Rešenjaunclassified

Prepoznavanje Imenovanih Entiteta U Sprskom Jeziku Pomoću Transformer Arhitekture

Cvejić¹

2022

Zbornik radova FTN

View full text Add to dashboard Cite

Za treniranje neuronskih mreži za obradu prirodnog jezika već postoje ustaljeni šabloni i principi isprobani nad engleskim jezikom. Prirodni sled dogaђаја је istraživanjе i razvijanje oblasti za druge jezike. U ovom radu predstavljena je arhitektura modela za prepoznacanju imenovanih entitet u srpskom jeziku. Kao ulaz model prima prirodno pisan jezik. Istrenirani model kao izlaz daje verovatnoće pripadnosti reči imenovanoj kategoriji. Predloženi su koraci za poboljšanje i dalji razvoj oblasti.

show abstract

“…Furthermore, textual corpora with over 20 million tokens have been collected and processed in order to train language models that can be used as a basis for grammatical and semantic error detection and correction in text in Serbian [11]. Significant work has been conducted in the field of NLP on the Faculty of Philology in Belgrade, among which the most recent research was related to NER [12] and diacritization of text in Serbian [13]. However, the tools and language resources have not been open for research or application in industry.…”

Section: Introductionmentioning

confidence: 99%

A Python package for text processing for Serbian: nlpheart

Ostrogonac¹,

Rastović²,

Liliom³

2020

Sci Tech Rev

View full text Add to dashboard Cite

Within the past two decades, text processing became an important part of most state-of-the-art advanced automation systems. However, for many under-resourced languages it is still challenging to perform textual data preparation, due to the lack of adequate tools. In this work, a python package for text processing for Serbian called nlpheart is presented. This package has been developed in industry and it is planned to be released as an open-source text processing tool for Serbian for academic purposes as well.

show abstract

Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names

Cited by 6 publications

References 12 publications

Automated labeling of terms in medical reports in Serbian

Automated labeling of terms in medical reports in Serbian

Prepoznavanje Imenovanih Entiteta U Sprskom Jeziku Pomoću Transformer Arhitekture

A Python package for text processing for Serbian: nlpheart

Contact Info

Product

Resources

About