2021
DOI: 10.3390/info12020048
|View full text |Cite
|
Sign up to set email alerts
|

Coming to Grips with Age Prediction on Imbalanced Multimodal Community Question Answering Data

Abstract: For almost every online service, it is fundamental to understand patterns, differences and trends revealed by age demographic analysis—for example, take the discovery of malicious activity, including identity theft, violation of community guidelines and fake profiles. In the particular case of platforms such as Facebook, Twitter and Yahoo! Answers, user demographics have impacts on their revenues and user experience; demographics assist in ensuring that the needs of each cohort are fulfilled via personalizing … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
15
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
9

Relationship

0
9

Authors

Journals

citations
Cited by 15 publications
(15 citation statements)
references
References 41 publications
0
15
0
Order By: Relevance
“…Although the methodology adopted here is learner-independent, we conducted our study with the Random Forest (RF) classifier [11], which is increasingly being employed in several application scenarios, even in the context of high-dimensional or imbalanced problems (e.g., [12][13][14][15]38,39]). In brief, the RF classifier can be considered as a special case of bagging, an ensemble approach that combines predictions from multiple classifiers More in detail, the ranking-based approach leverages a proper evaluation criterion to weight each single feature based on its relevance to the target class; then, according to their weights, the features are ordered from the most important to the least important, and only a predefined number of the top-ranked features are used for classification.…”
Section: Classification Methods and Evaluation Metricsmentioning
confidence: 99%
See 1 more Smart Citation
“…Although the methodology adopted here is learner-independent, we conducted our study with the Random Forest (RF) classifier [11], which is increasingly being employed in several application scenarios, even in the context of high-dimensional or imbalanced problems (e.g., [12][13][14][15]38,39]). In brief, the RF classifier can be considered as a special case of bagging, an ensemble approach that combines predictions from multiple classifiers More in detail, the ranking-based approach leverages a proper evaluation criterion to weight each single feature based on its relevance to the target class; then, according to their weights, the features are ordered from the most important to the least important, and only a predefined number of the top-ranked features are used for classification.…”
Section: Classification Methods and Evaluation Metricsmentioning
confidence: 99%
“…Although the methodology adopted here is learner-independent, we conducted our study with the Random Forest (RF) classifier [11], which is increasingly being employed in several application scenarios, even in the context of high-dimensional or imbalanced problems (e.g., [12][13][14][15]38,39]). In brief, the RF classifier can be considered as a special case of bagging, an ensemble approach that combines predictions from multiple classifiers built from different bootstrap samples of the training data.…”
Section: Classification Methods and Evaluation Metricsmentioning
confidence: 99%
“…This demography analysis can be reported as real concern feeding artificial intelligence in proper exploration within [184]. Relatively, Figueroa et al [185] suggested a multi-modal method for automatic age recognition across CQA platforms that uses text, image, and metadata [186]. Social influence on user's CQA contribution: In a social question-answer community, Dong et al [187] explored the role of social influence on endorsement behaviour.…”
Section: Topical Expert Identificationmentioning
confidence: 99%
“…If demographic prediction can be performed using multi-source data to enrich the data volumes and features, the results will be more certain. Second, although some works do use heterogeneous data-sets from several sources at the same time to train the prediction model, the multi-source data used in these works are merged in a hard-matching method which does not achieve data fusion in the true sense [12], [34], [35], [36], [37], [38], [39], [40], [41], [42], [43], [44], [45]. The hard-matching method refers to extracting users or features that overlap across multiple datasets, and then splicing them together to create a new dataset for model training.…”
mentioning
confidence: 99%