An Empirical Comparison of Portuguese and Multilingual BERT Models for Auto-Classification of NCM Codes in International Trade

Lima, Roberta Rodrigues de; Fernandes, Anita Maria da Rocha; Bombasar, James Roberto; Silva, Bruno Alves da; Crocker, Paul; Leithardt, Valderi Reis Quietinho

doi:10.3390/bdcc6010008

Cited by 5 publications

(6 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Fig. 5 presents the generalization of the models according to the temporal attributes, considering the Accuracy [35]. It can be observed that, regardless of the model, the temporal attribute duration/frequency allowed a greater generalization in all tested algorithms.…”

Section: Resultsmentioning

confidence: 98%

“…This scale ranges from P1 (autonomous individual with some level of supervision and help needed) to P14 (completely dependent). This database was selected for the following reasons: (i) observations are presented at the level of activities and not sensors; (ii) presents a long period of labeled observations (one year); (iii) guarantees the non-existence of outliers within the same dependency profile; (iv) well-founded data simulation process [35]; (vi) no other dataset with the same characteristics was found. The data from this database are organized into two main sets, the first consisting of one-year observations on the P1 profile.…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Novelty detection algorithms to help identify abnormal activities in the daily lives of elderly people

Fernandes,

Leithardt,

Santana

2024

IEEE Latin Am. Trans.

View full text Add to dashboard Cite

In old age, several common health conditions, chronic illnesses, and disabilities affect the individual's physical and mental health and prevent him from carrying out Activities of Daily Living. In this context, this article presents a comparative study between some Machine Learning algorithms used to identify behavioral abnormalities based on ADL (Activities of Daily Living), through the Novelty Detection technique. ADL data from eHealth Monitoring Open Data Project database were used to create a model that defines the baseline behavior of an elderly person, and new observations, to verify significant changes in behavior, are classified as outliers or abnormal. The Local Outlier Factor, One-class Support Vector Machine, Robust Covariance, and Isolation Forest algorithms were analyzed, and the Local Outlier Factor obtained the best result, reaching a precision and F1-Score of 96%. As elderly people can have completely different routines, the data from the dataset used are not generalizable, but specific to everyone. In this work, the issue of model retraining is not evaluated, however, a variation is recommended in the period of weeks necessary for model retraining. Despite the good performance obtained, it is necessary to consider reproducing the experiments with data from other databases, to improve the generalization of the proposed solution, as well as to carry out a more refined validation. It is also necessary to carry out experiments to evaluate whether the variation in the types of activities carried out throughout a day by an elderly person, as well as the inclusion of new activities in the elderly person's routine, can impact the performance of the proposed model. Link to graphical and video abstracts, and to code: https://latamt.ieeer9.org/index.php/transactions/article/view/8373

show abstract

Section: Resultsmentioning

confidence: 98%

Section: Methodsmentioning

confidence: 99%

Novelty detection algorithms to help identify abnormal activities in the daily lives of elderly people

Fernandes,

Leithardt,

Santana

2024

IEEE Latin Am. Trans.

View full text Add to dashboard Cite

show abstract

“…A acurácia representa o número de instâncias classificadas corretamente em relação ao número total de instâncias avaliadas. Segundo Lima et al (2022), embora a acurácia tenha se mostrado a medida mais simples e difundida na literatura científica, ela apresenta alguns problemas nos casos em que se avalia o desempenho de bases de dados não balanceadas. Segundo os autores, há um problema de precisão em não conseguir distinguir bem entre diferentes distribuições de classificações incorretas.…”

Section: Metodologiaunclassified

Aprendizado De Máquina Em Ambientes Hospitalares: Um Estudo De Análise De Tendências De Sobrecarga Em Sistemas De Tecnologias Da Informação E Comunicação

Luchtenberg,

Fernandes,

Liebel

et al. 2023

Rev. Contemp.

View full text Add to dashboard Cite

No ambiente das instituições de saúde, é fundamental dispor de todas as ferramentas necessárias para que a gestão do fluxo de pacientes aconteça de forma rápida e eficiente. É fundamental que os sistemas de informação dessas instituições tenham desempenho adequado e estejam disponíveis o dia inteiro, durante todo o ano. Nesse contexto, esta pesquisa tem como objetivo avaliar a aplicação de algoritmos de Aprendizado de Máquina para que, com base nos dados de monitoramento, o sistema aprenda a se antecipar a uma possível sobrecarga. Os dados utilizados nesta pesquisa são provenientes do banco de dados de uma empresa que presta serviços de monitoramento para instituições hospitalares de Santa Catarina. O estudo analisou a aplicação dos algoritmos de aprendizado de máquina Decision Tree (DT), Long Short-Term Memory (LSTM) e KNN – K Nearest Neighbor. O algoritmo com melhor acurácia foi o KNN, com 0,9603. Em relação ao tempo de execução e treinamento dos algoritmos, o KNN novamente apresentou melhor resultado de treinamento, com 0,058 segundos. Quanto ao tempo de execução, o DT obteve o melhor resultado, com 0,0019 segundos. Apesar do algoritmo LSTM ter apresentado o pior tempo de treinamento e execução (680,17 segs. e 4,2 segs. respectivamente), apresentou o melhor resultado de Recall com 99% de assertividade na previsão de indisponibilidade. E para o trabalho em questão, como a previsão de indisponibilidade é o principal critério a ser avaliado, o algoritmo LSTM em geral obteve os melhores resultados.

show abstract

“…Typically for classification tasks, the metrics are based on the confusion matrix [115]; however, for prediction, the metrics are based on the error [116]. The coefficient of determination (R 2 ) measures the adjustment of a statistical model to the observed values of a random variable.…”

Section: Considered Measuresmentioning

confidence: 99%

Fault Prediction Based on Leakage Current in Contaminated Insulators Using Enhanced Time Series Forecasting Models

Neto

Stefenon

Meyer

et al. 2022

Sensors

Self Cite

View full text Add to dashboard Cite

To improve the monitoring of the electrical power grid, it is necessary to evaluate the influence of contamination in relation to leakage current and its progression to a disruptive discharge. In this paper, insulators were tested in a saline chamber to simulate the increase of salt contamination on their surface. From the time series forecasting of the leakage current, it is possible to evaluate the development of the fault before a flashover occurs. In this paper, for a complete evaluation, the long short-term memory (LSTM), group method of data handling (GMDH), adaptive neuro-fuzzy inference system (ANFIS), bootstrap aggregation (bagging), sequential learning (boosting), random subspace, and stacked generalization (stacking) ensemble learning models are analyzed. From the results of the best structure of the models, the hyperparameters are evaluated and the wavelet transform is used to obtain an enhanced model. The contribution of this paper is related to the improvement of well-established models using the wavelet transform, thus obtaining hybrid models that can be used for several applications. The results showed that using the wavelet transform leads to an improvement in all the used models, especially the wavelet ANFIS model, which had a mean RMSE of 1.58 ×10−3, being the model that had the best result. Furthermore, the results for the standard deviation were 2.18 ×10−19, showing that the model is stable and robust for the application under study. Future work can be performed using other components of the distribution power grid susceptible to contamination because they are installed outdoors.

show abstract

An Empirical Comparison of Portuguese and Multilingual BERT Models for Auto-Classification of NCM Codes in International Trade

Cited by 5 publications

References 13 publications

Novelty detection algorithms to help identify abnormal activities in the daily lives of elderly people

Novelty detection algorithms to help identify abnormal activities in the daily lives of elderly people

Aprendizado De Máquina Em Ambientes Hospitalares: Um Estudo De Análise De Tendências De Sobrecarga Em Sistemas De Tecnologias Da Informação E Comunicação

Fault Prediction Based on Leakage Current in Contaminated Insulators Using Enhanced Time Series Forecasting Models

Contact Info

Product

Resources

About