Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval 2020
DOI: 10.1145/3443279.3443309
|View full text |Cite
|
Sign up to set email alerts
|

Gender Prediction Based on Vietnamese Names with Machine Learning Techniques

Abstract: As biological gender is one of the aspects of presenting individual human, much work has been done on gender classification based on people names. The proposals for English and Chinese languages are tremendous; still, there have been few works done for Vietnamese so far. We propose a new dataset for gender prediction based on Vietnamese names. This dataset comprises over 26,000 full names annotated with genders. This dataset is available on our website for research purposes. In addition, this paper describes s… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
1
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
2
2
2

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(2 citation statements)
references
References 13 publications
(16 reference statements)
0
1
0
Order By: Relevance
“…Gender distribution is an important facet to examine the quality of AND datasets because there is a correlation between name patterns and genders (Jia & Zhao, 2019; To et al, 2020; Wais, 2016). However, this facet has not received enough attention in previous datasets.…”
Section: Resultsmentioning
confidence: 99%
“…Gender distribution is an important facet to examine the quality of AND datasets because there is a correlation between name patterns and genders (Jia & Zhao, 2019; To et al, 2020; Wais, 2016). However, this facet has not received enough attention in previous datasets.…”
Section: Resultsmentioning
confidence: 99%
“…Hasil studi menunjukkan bahwa model yang dikembangkan dapat mencapai tingkat akurasi yang cukup baik, yaitu sekitar 93%. Namun, penelitian-penelitian sebelumnya telah menunjukkan bahwa metode-metode tradisional seperti SVM [14], Naive Bayes [2], [14], dan Decision Tree [15], [16] memiliki keterbatasan dalam menangani ketergantungan jangka panjang dalam data sekuensial, sedangkan LSTM adalah jenis jaringan saraf rekursif yang dirancang untuk mengatasi masalah tersebut [4], [17], [18].…”
Section: Pendahuluanunclassified