Knowledge of material properties, microstructure, underlying material composition, and manufacturing process parameters that the material has undergone is of significant interest to materials scientists and engineers. A large amount of information of this nature is available in publications in the form of experimental measurements, simulation results, etc. However, getting to the right information of this kind that is relevant for a given problem on hand is a non-trivial task. First, an engineer has to go through a large collection of documents to select the right ones. Then, the engineer has to scan through these selected documents to extract relevant pieces of information. Our goal is to help automate some of these steps. Traditional search engines are not of much help here, as they are keyword centric and weak on relation processing. In this paper, we present a domain-specific search engine that processes relations to significantly improve search accuracy. The engine preprocesses material publication repositories to extract entities such as material compositions, material properties, manufacturing processes, process parameters, and their values and builds an index using these entities and values. The engine then uses this index to process user queries to retrieve relevant publication fragments. It provides a domain-specific query language with relational and logical operators to compose complex queries. We have conducted an experiment on a small library of publications on steel on which searches such as “get the list of publications which have carbon composition between 0.2 and 0.3 and on which tempering is carried out for about 30 to 40 min” are performed. We compare the results of our search engine with the results of a keyword-based search engine.
We report in this paper a way of doing Word Sense Disambiguation (WSD) that has its origin in multilingual MT and that is cognizant of the fact that parallel corpora, wordnets and sense annotated corpora are scarce resources. With respect to these resources, languages show different levels of readiness; however a more resource fortunate language can help a less resource fortunate language. Our WSD method can be applied to a language even when no sense tagged corpora for that language is available. This is achieved by projecting wordnet and corpus parameters from another language to the language in question. The approach is centered around a novel synset based multilingual dictionary and the empirical observation that within a domain the distribution of senses remains more or less invariant across languages. The effectiveness of our approach is verified by doing parameter projection and then running two different WSD algorithms. The accuracy values of approximately 75% (F1-score) for three languages in two different domains establish the fact that within a domain it is possible to circumvent the problem of scarcity of resources by projecting parameters like sense distributions, corpus-co-occurrences, conceptual distance, etc. from one language to another.
Devices like solar cells, batteries etc. often comprise of a host of material types including organic, inorganic and hybrid materials. The fabrication procedures for these devices involve screening or designing the right set of materials and then subjecting them to a sequence of operations under very specific conditions. The performance characteristics of a device critically depend on the materials used in its fabrication, the specific operations carried out, their operating conditions and the specific sequence in which they are carried out. The space of potential materials, operations and operating conditions is vast, and selecting the right combination thereof to achieve the desired characteristics is a knowledge intensive activity. A large amount of such device fabrication knowledge is available in the form of publications, patents, company reports and so on. In this paper, we present a system that systematically extracts this knowledge from materials science literature. The extracted knowledge is represented as knowledge graphs conforming to an ontology that can be queried to make informed decisions in device fabrication procedures. The system first identifies the set of relevant paragraphs that contain fabrication knowledge. It then employs state of the art entity and relation extraction models to identify instances of operations, methods, materials, etc. and relations between them. The system then applies an unsupervised algorithm to identify sequences of operations representing fabrication procedures. We applied our system on solar cell fabrication knowledge extraction and achieved good performance. We believe our results provide much needed impetus for further work in this area.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.