In this paper, we present an extension on a hybrid-based deduplication technique in entity reconciliation (ER), by proposing an algorithm that builds clusters upon receiving a pre-specified K number of clusters, and second developing a crowd-based procedure for refining the results of the clusters produced after the clustering generation phases. With the clusters refined, we aim to minimize the cost metric (R) of the solitary and compound cluster generation algorithms, to achieve an improved and efficient deduplication method, to have an increase in accuracy in identifying duplicate records, and finally, further reduce the crowdsourcing overheads incurred. In this paper, in the experiments, we made use of three datasets commonly known to hybrid-based deduplication such as paper, product, and restaurant. The performance results and evaluations demonstrate clear superiority to the methods compared with our work offering low-crowdsourcing cost and high accuracy of deduplication, as well as better deduplication efficiency due to the clusters being refined. INDEX TERMS Cluster refinement, minimization approach, triangular split and merger operations, entity reconciliation, crowdsourcing.
Product design experts depend on online customer reviews as a source of insight to improve product design. Previous works used aspect-based sentiment analysis to extract insight from product reviews. However, their approaches for requirements elicitation are less flexible than traditional tools such as interviews and surveys. They require costly data labeling or pre-labeled datasets, lack domain knowledge integration, and focus more on sentiment classification than flexible aspect-opinion analysis. Related works lack effective mechanisms for probing the customer feedback of complex configurable products. This study proposes a generic graph-based opinion mining and analysis method for product design improvement. First, a customer feedback data preprocessing and annotation pipeline that can incorporate designer-specified domain knowledge is proposed. Second, an intuitive opinion-aware labeled property graph data model is designed to ingest preprocessed feedback data and perform ad hoc opinion analysis. Applying the generic model to a real-world dataset demonstrates superior functionality and flexibility compared to related works. A wider range of analyses is supported in a single model without repeating data preprocessing and modeling. Specifically, the proposed method supports regular and comparative aspect-opinion analysis, aspect satisfaction/influence ranking, opinion trend extraction, and targeted aspect-opinion summarization.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.