In this paper, we translate the multiplication of several matrices into a multi-way join operation among several relations. Matrix multiplication is widely used for many graph algorithms, such as those that calculate the transitive closure. These algorithms benefit from the multi-way join operation because this operation reduces the number of binary multiplications. Our implementation is based on the MapReduce framework, allowing us to provide scalable computation for large matrices. Although several papers have investigated matrix multiplication using MapReduce, this paper takes a different perspective. First, we expand the problem from binary multiplication to n-ary multiplication. For this reason, we apply the concept of parallelism, not only to an individual operation but also to the entire equation. Second, we represent a matrix as a relation consisting of (row, col, val) records and translate a multiplication into a join operation in database systems. This facilitates the efficient storage of sparse matrices, which are very common in real-world graph data, and the easy manipulation of matrices. Although this work is still in progress, we conducted a number of experiments to verify the idea. We also discuss current limitations and future works.
Although documents have hundreds of thousands of unique words, only a small number of words are significantly useful for text analysis. Thus, feature selection has become an important issue to be addressed in various text analysis studies. A number of techniques and algorithms for feature selection are available, but unfortunately, it is hard to say that a certain algorithm overcomes the others, because feature selection results mostly depend on the source documents. We should pick and choose the appropriate algorithm and the best subset of feature words whenever we need to analyze source documents. In this paper, we present a framework named ‘PicAChoo’, which stands for ‘Pick And Choose’ that enables customizable feature selection environments by composing several primitive feature selection methods without hard-coding. As indicated in the name, this framework provides many strategies for extracting appropriate features and allows dynamic compositions among several feature selection methods. In addition, it tries to give users an environment that utilizes linguistic characteristics of textual data, namely part-of-speech, sentence structures, and so on. Finally, we illustrate that selected feature words can be used for various intelligent services.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.