Multidimensional Scaling by Majorization: A Review

Groenen, Patrick J. F.; Velden, Michel van de

doi:10.18637/jss.v073.i08

Cited by 21 publications

(12 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Weights are useful when we have input data with missing values. Since there is no restriction on any distance X, we can define fixed values of w ij = 0 if δ ij is missing and w ij = 1 otherwise [20].…”

Section: A Multidimensional Scalingmentioning

confidence: 99%

Generating Name-Like Vectors for Testing Large-Scale Entity Resolution

2021

View full text Add to dashboard Cite

Accurate and efficient entity resolution (ER) has been a problem in data analysis and data mining projects for decades. In our work, we are interested in developing ER methods to handle big data. Good public datasets are restricted in this area and usually small in size. Simulation is one technique for generating datasets for testing. Existing simulation tools have problems of complexity, scalability and limitations of resampling. We address these problems by introducing a better way of simulating testing data for big data ER. Our proposed simulation model is simple, inexpensive and fast. We focus on avoiding the detail-level simulation of records using a simple vector representation. In this paper, we will discuss how to simulate simple vectors that approximate the properties of names (commonly used as identification keys).

show abstract

Section: A Multidimensional Scalingmentioning

confidence: 99%

Generating Name-Like Vectors for Testing Large-Scale Entity Resolution

2021

View full text Add to dashboard Cite

show abstract

“…LSMDS initially maps each item in the non-metric or metric-space to a 𝐾-dimensional point. Then minimises the discrepancy between the actual dissimilarities and the estimated distances in the 𝐾-dimensional space by optimisation [13]. This discrepancy is measured using raw stress (𝜎 𝑟𝑎𝑤 ) given by the relative error where 𝛿 𝑖 𝑗 is the dissimilarity between the two objects and 𝑑 𝑖 𝑗 is the Euclidean distance between their estimated points.…”

Section: Problem Formulationmentioning

confidence: 99%

“…Possible weights for each pair of points are denoted by 𝑤 𝑖 𝑗 . Weights are useful in handling missing values and the default values are 𝑤 𝑖 𝑗 = 0, if 𝛿 𝑖 𝑗 is missing and 𝑤 𝑖 𝑗 = 1, otherwise [13]. We do not apply weights in this work, hence, 𝑤 𝑖 𝑗 = 1 always.…”

Section: Problem Formulationmentioning

confidence: 99%

Em-K Indexing for Approximate Query Matching in Large-scale ER

Herath¹,

Roughan²,

Glonek³

2021

Preprint

View full text Add to dashboard Cite

Accurate and efficient entity resolution (ER) is a significant challenge in many data mining and analysis projects requiring integrating and processing massive data collections. It is becoming increasingly important in real-world applications to develop ER solutions that produce prompt responses for entity queries on large-scale databases. Some of these applications demand entity query matching against large-scale reference databases within a short time. We define this as the query matching problem in ER in this work. Indexing or blocking techniques reduce the search space and execution time in the ER process. However, approximate indexing techniques that scale to very large-scale datasets remain open to research. In this paper, we investigate the query matching problem in ER to propose an indexing method suitable for approximate and efficient query matching.We first use spatial mappings to embed records in a multidimensional Euclidean space that preserves the domain-specific similarity. Among the various mapping techniques, we choose multidimensional scaling. Then using a Kd-tree and the nearest neighbour search, the method returns a block of records that includes potential matches for a query. Our method can process queries against a large-scale dataset using only a fraction of the data 𝐿 (given the dataset size is 𝑁 ), with a 𝑂 (𝐿 2 ) complexity where 𝐿 ≪ 𝑁 . The experiments conducted on several datasets showed the effectiveness of the proposed method.

show abstract

“…The non-negative weights w i, j in (39) were originally included and suggested by De Leeuw to provide more flexibility. They can be used to express the importance of the residualŝ (X) or can be used to handle missing data (Groenen and van de Velden 2016). For multidimensional unfolding, the configuration matrix X can be decomposed in two matrices X 1 and X 2 , which are of dimensionality n 1 × p and n 2 × p, respectively.…”

Section: Dyadic Unfolding With Smacofmentioning

confidence: 99%

Dyad ranking using Plackett–Luce models based on joint feature representations

Schäfer

Hüllermeier

2018

Mach Learn

View full text Add to dashboard Cite

Label ranking is a specific type of preference learning problem, namely the problem of learning a model that maps instances to rankings over a finite set of predefined alternatives. Like in conventional classification, these alternatives are identified by their name or label while not being characterized in terms of any properties or features that could be potentially useful for learning. In this paper, we consider a generalization of the label ranking problem that we call dyad ranking. In dyad ranking, not only the instances but also the alternatives are represented in terms of attributes. For learning in the setting of dyad ranking, we propose an extension of an existing label ranking method based on the Plackett-Luce model, a statistical model for rank data. This model is combined with a suitable feature representation of dyads. Concretely, we propose a method based on a bilinear extension, where the representation is given in terms of a Kronecker product, as well as a method based on neural networks, which allows for learning a (highly nonlinear) joint feature representation. The usefulness of the additional information provided by the feature description of alternatives is shown in several experimental studies. Finally, we propose a method for the visualization of dyad rankings, which is based on the technique of multidimensional unfolding.

show abstract

Multidimensional Scaling by Majorization: A Review

Cited by 21 publications

References 22 publications

Generating Name-Like Vectors for Testing Large-Scale Entity Resolution

Generating Name-Like Vectors for Testing Large-Scale Entity Resolution

Em-K Indexing for Approximate Query Matching in Large-scale ER

Dyad ranking using Plackett–Luce models based on joint feature representations

Contact Info

Product

Resources

About