“…To compute the geodesic distances, we need decide on the number of nearest neighbors, k. If k is too large, it would cause the short circuit edges that shortcut the true geometry of a manifold reecting the non-linear structure of data; if k is too small, it will causes the manifold to fragment into a large number of disconnected clusters. Following Samko et al (2006), we choose k by maximizing |ρ(D, Φ k,p )|, where D and Φ k,p are the matrices of the Euclidean distances between a pair of points in the original space and the feature space, respectively, and ρ(·, ·) is the linear correlation coecient. Note that Φ k,p depends on p, the dimension of the space of the embeddings.…”