DBpedia Entity Type Detection Using Entity Embeddings and N-Gram Models

Zhou, Hanqing; Zouaq, Amal; Inkpen, Diana

doi:10.1007/978-3-319-69548-8_21

Cited by 7 publications

(6 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…This paradigm requires to provide gold type labels for entities and train binary or multi-class classifiers to classify entities to the right types. Some previous studies [5,36] proposed to use supervised classification for typing error detection, but it is hard to scale as the number of types is large for many KGs (in total 778 types in DBpedia). One work [5] tried to tackle the scalability problem by another entity type dataset of better quality, but this could not fundamentally solve the issue as external datasets are also noisy and may be unavailable.…”

Section: Classificationmentioning

confidence: 99%

“…Data-driven approaches to deal with typing errors in factual KGs have a very broad spectrum, covering fully unsupervised clustering and outlier detection [1,21], semi-supervised noise models that could leverage noisy labels [10,12,14], and supervised noise detection methods that fully rely on gold labels [5,36]. In this study, we present a taxonomy of the KG typing error detection paradigms and comprehensively evaluate those paradigms on DBpedia.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Typing Errors in Factual Knowledge Graphs: Severity and Possible Ways Out

Yao

Barbosa

2021

Preprint

View full text Add to dashboard Cite

Factual knowledge graphs (KGs) such as DBpedia and Wikidata have served as part of various downstream tasks and are also widely adopted by artificial intelligence research communities as benchmark datasets. However, we found these KGs to be surprisingly noisy. In this study, we question the quality of these KGs, where the typing error rate is estimated to be 27% for coarse-grained types on average, and even 73% for certain fine-grained types. In pursuit of solutions, we propose an active typing error detection algorithm that maximizes the utilization of both gold and noisy labels. We also comprehensively discuss and compare unsupervised, semisupervised, and supervised paradigms to deal with typing errors in factual KGs. The outcomes of this study provide guidelines for researchers to use noisy factual KGs. To help practitioners deploy the techniques and conduct further research, we published our code and data 1 .

show abstract

Section: Classificationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Typing Errors in Factual Knowledge Graphs: Severity and Possible Ways Out

Yao

Barbosa

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…2. This paper introduces novel techniques in clickstream data analytics to unleash key customer journeys through pattern mining using the n-grams and Student T-Test, which distinguishes between regular patterns and special sequences [40,48]. A model is proposed to predict users' transition from one state to another based on the higher-order Markov chains.…”

Section: A Case Study Through Clickstream Data Analysismentioning

confidence: 99%

Real-time user clickstream behavior analysis based on apache storm streaming

Pal

Atkinson

2021

Electron Commer Res

View full text Add to dashboard Cite

This paper presents an approach to analyzing consumers’ e-commerce site usage and browsing motifs through pattern mining and surfing behavior. User-generated clickstream is first stored in a client site browser. We build an ingestion pipeline to capture the high-velocity data stream from a client-side browser through Apache Storm, Kafka, and Cassandra. Given the consumer’s usage pattern, we uncover the user’s browsing intent through n-grams and Collocation methods. An innovative clustering technique is constructed through the Expectation-Maximization algorithm with Gaussian Mixture Model. We discuss a framework for predicting a user’s clicks based on the past click sequences through higher order Markov Chains. We developed our model on top of a big data Lambda Architecture which combines high throughput Hadoop batch setup with low latency real-time framework over a large distributed cluster. Based on this approach, we developed an experimental setup for an optimized Storm topology and enhanced Cassandra database latency to achieve real-time responses. The theoretical claims are corroborated with several evaluations in Microsoft Azure HDInsight Apache Storm deployment and in the Datastax distribution of Cassandra. The paper demonstrates that the proposed techniques help user experience optimization, building recently viewed products list, market-driven analyses, and allocation of website resources.

show abstract

“…早期的方法一般是将实体与实体类型作为三元组的头尾实体, 谓词就是 type, 这样构成的 RDF 三元组可以利用嵌入式学习过程完成向量表示的学习, 从而完成类型预测任务. 但是仅仅简单地将实体的一个类型作为尾实体而言会损失很多信息, 比如实体所在文本的上下文环境信息, 外部知识库中对该类实体的描述等, 而且一个实体的类型是多样且有层次的, 所以对某个实体的相关文本也可以做嵌入式学习 [69,70] , 比如将其本身及上下文环境都变为低维向量的表示形式, 然后将这些低维向量输入到深度学习模型中 [71,72] , 从而使类型的推理简化为利用神经网络模型来执行的类别判断, 此时在网络中也还可以引入注意力机制, 该机制可以来自于自然语言处理技术中常用的条件约束, 也可以使用知识库中已知的类型层次结构信息来产生, 总之就是要引入外部信息 [72] 来改进嵌入式方法的效果. [75] 使用后向传播方法来完成权值优化, 但需要非常多次迭代才能收敛; Minkov 和 Cohen [76] 在 2008 年提出了基于生成学习模型的随机游走策略以使路径上的实体更相关; 特别在 2010 年, Lao 等 [77] 提出了代表性的路径排序算法 PRA (path ranking algorithm), 其优化了边参数化随机游走模型, 并增加了约束以提高计算效率.…”

Section: 基于表示学习的类型推理机制unclassified

Research progress of large-scale knowledge graph completion technology

Wang

Meng

2020

Sci. Sin.-Inf.

View full text Add to dashboard Cite

Progress and directions in low-cost redox flow batteries for large-scale energy storage National Science Review 4, 91 (2017); Research trend of large-scale supercomputers and applications from the TOP500 and Gordon Bell Prize SCIENCE CHINA Information Sciences 63, 171001 (2020); Research on evolution and prevention of the No.1 large-scale dangerous slope of Erlang Mountain on the Sichuan-Tibet Highway Science in China Series E-Technological Sciences 46, 42 (2003); Developing trend of design and manufacture technology for large-scale wind turbine blade

show abstract

DBpedia Entity Type Detection Using Entity Embeddings and N-Gram Models

Cited by 7 publications

References 18 publications

Typing Errors in Factual Knowledge Graphs: Severity and Possible Ways Out

Typing Errors in Factual Knowledge Graphs: Severity and Possible Ways Out

Real-time user clickstream behavior analysis based on apache storm streaming

Research progress of large-scale knowledge graph completion technology

Contact Info

Product

Resources

About