2022
DOI: 10.3390/bdcc6010008
|View full text |Cite
|
Sign up to set email alerts
|

An Empirical Comparison of Portuguese and Multilingual BERT Models for Auto-Classification of NCM Codes in International Trade

Abstract: Classification problems are common activities in many different domains and supervised learning algorithms have shown great promise in these areas. The classification of goods in international trade in Brazil represents a real challenge due to the complexity involved in assigning the correct category codes to a good, especially considering the tax penalties and legal implications of a misclassification. This work focuses on the training process of a classifier based on bidirectional encoder representations fro… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1

Relationship

2
3

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 13 publications
0
3
0
Order By: Relevance
“…Fig. 5 presents the generalization of the models according to the temporal attributes, considering the Accuracy [35]. It can be observed that, regardless of the model, the temporal attribute duration/frequency allowed a greater generalization in all tested algorithms.…”
Section: Resultsmentioning
confidence: 98%
See 1 more Smart Citation
“…Fig. 5 presents the generalization of the models according to the temporal attributes, considering the Accuracy [35]. It can be observed that, regardless of the model, the temporal attribute duration/frequency allowed a greater generalization in all tested algorithms.…”
Section: Resultsmentioning
confidence: 98%
“…This scale ranges from P1 (autonomous individual with some level of supervision and help needed) to P14 (completely dependent). This database was selected for the following reasons: (i) observations are presented at the level of activities and not sensors; (ii) presents a long period of labeled observations (one year); (iii) guarantees the non-existence of outliers within the same dependency profile; (iv) well-founded data simulation process [35]; (vi) no other dataset with the same characteristics was found. The data from this database are organized into two main sets, the first consisting of one-year observations on the P1 profile.…”
Section: Methodsmentioning
confidence: 99%
“…A acurácia representa o número de instâncias classificadas corretamente em relação ao número total de instâncias avaliadas. Segundo Lima et al (2022), embora a acurácia tenha se mostrado a medida mais simples e difundida na literatura científica, ela apresenta alguns problemas nos casos em que se avalia o desempenho de bases de dados não balanceadas. Segundo os autores, há um problema de precisão em não conseguir distinguir bem entre diferentes distribuições de classificações incorretas.…”
Section: Metodologiaunclassified
“…Typically for classification tasks, the metrics are based on the confusion matrix [115]; however, for prediction, the metrics are based on the error [116]. The coefficient of determination (R 2 ) measures the adjustment of a statistical model to the observed values of a random variable.…”
Section: Considered Measuresmentioning
confidence: 99%