Prospecting Information Extraction by Text Mining Based on Convolutional Neural Networks–A Case Study of the Lala Copper Deposit, China

Shi, Li; Chen, Jianping; Xiang, Jie

doi:10.1109/access.2018.2870203

Cited by 66 publications

(21 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…IE process must be efficient enough to improve the effectiveness of big data analysis. Heterogeneity, dimensionality and diversity of data are important to handle for IE using big data [32,33]. However, volume of unstructured data is getting double every year [1], it is becoming…”

Section: Event Extraction (Ee) and Salient Facts Extractionmentioning

confidence: 99%

“…Semi-supervised techniques use both labeled and unlabeled corpus with small degree of supervision [121]. For large scale data, distant supervised learning [26], deep learning (CNN, RNN, DNN) [9,10,18,23,[31][32][33], transfer learning [25] techniques are more suitable for IE from free-text data.…”

Section: Rule-based Approaches Learning-based Approachesmentioning

confidence: 99%

“…Unstructured big data comes with high dimensionality [16,18,66], diversity [55,124], dynamicity [32] and heterogeneity [33,131]. Dimensionality reduction [18] and semantic annotation [131] can further improve the IE performance of high dimensional and heterogeneous data respectively.…”

Section: Dimensionality and Heterogeneitymentioning

confidence: 99%

See 2 more Smart Citations

An analytical study of information extraction from unstructured and multidimensional big data

2019

View full text Add to dashboard Cite

IntroductionInformation extraction (IE) process extracts useful structured information from the unstructured data in the form of entities, relations, objects, events and many other types. The extracted information from unstructured data is used to prepare data for analysis. Therefore, the efficient and accurate transformation of unstructured data in the IE process improves the data analysis. Numerous techniques have been introduced for different data types i.e. text, image, audio, and video.The advancement in technology promoted the rapid growth of data volume in recent years. The volume, variety (structured, unstructured, and semi-structured data) and velocity of big data have also changed the paradigm of computational capabilities of the systems. IBM estimated that more than 2.5 quintillion bytes of data are generated every Abstract Process of information extraction (IE) is used to extract useful information from unstructured or semi-structured data. Big data arise new challenges for IE techniques with the rapid growth of multifaceted also called as multidimensional unstructured data. Traditional IE systems are inefficient to deal with this huge deluge of unstructured big data. The volume and variety of big data demand to improve the computational capabilities of these IE systems. It is necessary to understand the competency and limitations of the existing IE techniques related to data pre-processing, data extraction and transformation, and representations for huge volumes of multidimensional unstructured data. Numerous studies have been conducted on IE, addressing the challenges and issues for different data types such as text, image, audio and video. Very limited consolidated research work have been conducted to investigate the task-dependent and task-independent limitations of IE covering all data types in a single study. This research work address this limitation and present a systematic literature review of state-of-the-art techniques for a variety of big data, consolidating all data types. Recent challenges of IE are also identified and summarized. Potential solutions are proposed giving future research directions in big data IE. The research is significant in terms of recent trends and challenges related to big data analytics. The outcome of the research and recommendations will help to improve the big data analytics by making it more productive. which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. Adnan and Akbar J Big Data (2019) 6:91 Malaysia Adnan and Akbar J Big Data (2019) 6:91 RESEARCHday. Among these statistics, it was also predicted that unstructured data from diverse sources will grow up to 90% in few years. IDC estimated that unstructured data will be 95% of the global data in 2020 with estimated 65% annual growth rate [1]. The common characteristics of unstructured data are, (i) it comes in multiple formats...

show abstract

Section: Event Extraction (Ee) and Salient Facts Extractionmentioning

confidence: 99%

Section: Rule-based Approaches Learning-based Approachesmentioning

confidence: 99%

See 1 more Smart Citation

An analytical study of information extraction from unstructured and multidimensional big data

2019

View full text Add to dashboard Cite

show abstract

“…Finally, the TF-IDF [9,10] method was used to extract the keywords of the literature, and the keywords with relatively large co-occurrence relations were connected to form a knowledge graph. Shi et al [11] also used TF-IDF to extract keywords to construct a knowledge graph. However, unlike Wang et al [7], Shi et al [11] trained a CNN-based classifier that automatically divides the geoscience literature into four categories (geophysics, geology, remote sensing, and geochemistry) and then constructs the corresponding knowledge graph.…”

Section: Related Workmentioning

confidence: 99%

Deep Learning-Based Named Entity Recognition and Knowledge Graph Construction for Geological Hazards

Fan

Wang

Yan

et al. 2019

IJGI

View full text Add to dashboard Cite

Constructing a knowledge graph of geological hazards literature can facilitate the reuse of geological hazards literature and provide a reference for geological hazard governance. Named entity recognition (NER), as a core technology for constructing a geological hazard knowledge graph, has to face the challenges that named entities in geological hazard literature are diverse in form, ambiguous in semantics, and uncertain in context. This can introduce difficulties in designing practical features during the NER classification. To address the above problem, this paper proposes a deep learning-based NER model; namely, the deep, multi-branch BiGRU-CRF model, which combines a multi-branch bidirectional gated recurrent unit (BiGRU) layer and a conditional random field (CRF) model. In an end-to-end and supervised process, the proposed model automatically learns and transforms features by a multi-branch bidirectional GRU layer and enhances the output with a CRF layer. Besides the deep, multi-branch BiGRU-CRF model, we also proposed a pattern-based corpus construction method to construct the corpus needed for the deep, multi-branch BiGRU-CRF model. Experimental results indicated the proposed deep, multi-branch BiGRU-CRF model outperformed state-of-the-art models. The proposed deep, multi-branch BiGRU-CRF model constructed a large-scale geological hazard literature knowledge graph containing 34,457 entities nodes and 84,561 relations.

show abstract

“…This is an open access article under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non-commercial and no modifications or adaptations are made. data to making full use of those free raw resources and developing standard and scalable models to process the fast-growing collection of available text corpora (Shi et al, 2018;Tran et al, 2017;Zhu & Iglesias, 2018).…”

Section: Introductionmentioning

confidence: 99%

GNER: A Generative Model for Geological Named Entity Recognition Without Labeled Data Using Deep Learning

Qiu

Xie

Tao

2019

Earth and Space Science

View full text Add to dashboard Cite

A variety of detailed data about geological topics and geoscience knowledge are buried in the geoscience literature and rarely used. Named entity recognition (NER) provides both opportunities and challenges to leverage this wealth of data in the geoscience literature for data analysis and further information extraction. Existing NER models and techniques are mainly based on rule‐based and supervised approaches, and developing such systems requires a costly manual effort. In this paper, we first design a generic stepwise framework for domain‐specific NER. Following this framework, domain‐specific entities and domain‐general words are collected and selected as seed terms. Normalization and grouping processes are then applied to these seed terms for further analysis. A random extraction algorithm based on a unigram language model is used to generate a large‐scale training data set consisting of probabilistically labeled pseudosentences. Each generated sentence is then used as input to the self‐training and learning algorithm. Experimental results on two constructed data sets demonstrate that the proposed model effectively recognizes and identifies geological named entities.

show abstract

Prospecting Information Extraction by Text Mining Based on Convolutional Neural Networks–A Case Study of the Lala Copper Deposit, China

Cited by 66 publications

References 22 publications

An analytical study of information extraction from unstructured and multidimensional big data

An analytical study of information extraction from unstructured and multidimensional big data

Deep Learning-Based Named Entity Recognition and Knowledge Graph Construction for Geological Hazards

GNER: A Generative Model for Geological Named Entity Recognition Without Labeled Data Using Deep Learning

Contact Info

Product

Resources

About