2005
DOI: 10.1109/tpami.2005.192
|View full text |Cite
|
Sign up to set email alerts
|

Building k edge-disjoint spanning trees of minimum total length for isometric data embedding

Abstract: Isometric data embedding requires construction of a neighborhood graph that spans all data points so that geodesic distance between any pair of data points could be estimated by distance along the shortest path between the pair on the graph. This paper presents an approach for constructing k-edge-connected neighborhood graphs. It works by finding k edge-disjoint spanning trees the sum of whose total lengths is a minimum. Experiments show that it outperforms the nearest neighbor approach for geodesic distance e… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
32
0
1

Year Published

2008
2008
2018
2018

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 55 publications
(34 citation statements)
references
References 8 publications
1
32
0
1
Order By: Relevance
“…Besides the embedding results, the residual variance is also taken as an evaluation criterion (Tenenbaum et al, 2000;Geng et al, 2005;Yang, 2005Yang, , 2006. Residual variance is defined as 1 À R 2 ðD Y ; D G Þ where D Y is a matrix of Euclidean distances between data points after embedding, D G is a matrix of estimated geodesic distances, and R represents correlation coefficient.…”
Section: Resultsmentioning
confidence: 99%
See 3 more Smart Citations
“…Besides the embedding results, the residual variance is also taken as an evaluation criterion (Tenenbaum et al, 2000;Geng et al, 2005;Yang, 2005Yang, , 2006. Residual variance is defined as 1 À R 2 ðD Y ; D G Þ where D Y is a matrix of Euclidean distances between data points after embedding, D G is a matrix of estimated geodesic distances, and R represents correlation coefficient.…”
Section: Resultsmentioning
confidence: 99%
“…The second method is the k-nearest neighbors approach, which has been extensively applied and improved in many ways: The transitive closure of neighbors of any data point is required to cover all data points, otherwise the information on relative positions of the connected components would be lost. This is why effective approaches have been proposed to build the connected neighborhood graph (Yang, 2005(Yang, , 2006. Since the Euclidean distance cannot be applied to discover the neighborhood with any shape, the geodesic distance and path algebra have been applied to determine whether two points are neighbors Varini et al, 2006).…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…(multi-dimensional scaling)方法来获取全局最优的几何结构,获得了较好的效果,目前已发展了很多改进算法,如 基于核方法的ISOMAP、监督ISOMAP [3] 、增量式ISOMAP [4] 等.LLE在降维嵌入过程中保持局部的几何结构不 变,并能避免局部极小,最终获得一个全局的低维嵌入系统,效果也很好.目前的改进算法包括利用Hessian变换 改进的算法HLLE(Hessian LLE) [5] 、利用数据分类信息改进的监督LLE、增量式LLE [6] 、利用Fisher改进的LLE [7] 等.目前,国内也展开了较深入的理论研究和应用实践 [8] .例如,ISOMAP中连续流形与其低维参数空间等距映射 的存在性证明 [9] 、根据放大因子和延伸方向研究高维观测数据与其低维参数空间数据的联系 [10] 等. ISOMAP的 基本假设是全局等距映射和凸的参数空间,这在很多情况下难以满足;而HLLE只要求局部等距映射和开的连 通参数空间,有更宽的应用范围.但是,与ISOMAP一样,都极大程度地依赖于局部邻域是否正确地反映了流形的 内在结构.现有的k-近邻邻域确定方法对稀疏和噪音数据容易产生扭曲的邻域结构,从而导致短路现象 [11] .所谓 短路是指流形上的折叠面靠得很近,使得某些点的邻域来自不同的折叠面,因而并不是流形上的最近邻,这常常 导致显著的性能偏差,因此需要邻域优化.邻域优化方法包括从完全连接图中重复抽取最小生成树来构造连通 邻域图的方法 [12] ,以保证降维之后不丢失数据之间的相对位置.利用数据的分类信息重定义距离,进而利用新定 义的距离来确定邻域的方法 [3] ,缺点是对无分类信息的数据不适用.目前,也有利用残差和线性重构造系数来自 动选择最佳邻域大小的研究 [13−15] ,但一旦确定,每个数据点的邻域大小仍然是相同的.另一种方法是为每个点选 择初步邻域,利用PCA(principal component analysis)构造此邻域的主线性子空间,然后从邻域中删除偏离主线性 子空间的邻域点 [16] ,当邻域本质上是非线性时,此方法可能不适用,同时,太多的参数使得应用起来较为困难. [17] ,利用图代数优 化邻域 [18] 等,但邻域大小仍然是全局统一的.考虑到HLLE需要保持局部区域的线性化,当数据流形是非均匀分 布时,采用全局统一的邻域大小难以满足,因为若邻域参数取得太大,则容易消除流形的小尺度结构,并不可避 免地面临短路问题,相反,则容易导致流形分裂 [19] .因此,我们曾提出了对整个不均匀分布流形递归分解为近似 均匀分布的子流形,并自动计算每个子流形邻域大小的方法,进而改进LLE [20] ,但是它需要计算所有点之间的测 2) 采用ISOMAP的方法计算局部数据集X i 中任意两点之间的局部测地距离,主要包括两步:…”
unclassified