This paper presents a performance evaluation of shape similarity metrics for 3D video sequences of people with unknown temporal correspondence. Performance of similarity measures is compared by evaluating Receiver Operator Characteristics for classification against ground-truth for a comprehensive database of synthetic 3D video sequences comprising animations of fourteen people performing twentyeight motions. Static shape similarity metrics shape distribution, spin image, shape histogram and spherical harmonics are evaluated using optimal parameter settings for each approach. Shape histograms with volume sampling are found to consistently give the best performance for different people and motions. Static shape similarity is extended over time to eliminate the temporal ambiguity. Time-filtering of the static shape similarity together with two novel shape-flow descriptors are evaluated against temporal ground-truth. This evaluation demonstrates that shape-flow with a multi-frame alignment of motion sequences achieves the best performance, is stable for different people and motions, and overcome the ambiguity in static shape similarity. Time-filtering of the static shape histogram similarity measure with a fixed window size achieves marginally lower performance for linear motions with the same computational cost as static shape descriptors. Performance of the temporal shape descriptors is validated for real 3D video sequence of nine actors performing a variety of movements. Time-filtered shape histograms are shown to reliably identify frames from 3D video sequences with similar shape and motion for people with loose clothing and complex motion
This paper presents a general approach based on the shape similarity tree for non-sequential alignment across databases of multiple unstructured mesh sequences from non-rigid surface capture. The optimal shape similarity tree for non-rigid alignment is defined as the minimum spanning tree in shape similarity space. Non-sequential alignment based on the shape similarity tree minimises the total non-rigid deformation required to register all frames in a database into a consistent mesh structure with surfaces in correspondence. This allows alignment across multiple sequences of different motions, reduces drift in sequential alignment and is robust to rapid non-rigid motion. Evaluation is performed on three benchmark databases of 3D mesh sequences with a variety of complex human and cloth motion. Comparison with sequential alignment demonstrates reduced errors due to drift and improved robustness to large non-rigid deformation, together with global alignment across multiple sequences which is not possible with previous sequential approaches.
Purpose To develop a deep learning method for prediction of three‐dimensional (3D) voxel‐by‐voxel dose distributions of helical tomotherapy (HT). Methods Using previously treated HT plans as training data, a deep learning model named U‐ResNet‐D was trained to predict a 3D dose distribution. First, the contoured structures and dose volumes were converted from plan database to 3D matrix with a program based on a developed visualization toolkit (VTK), then transferred to U‐ResNet‐D for correlating anatomical features and dose distributions at voxel‐level. One hundred and ninety nasopharyngeal cancer (NPC) patients treated by HT with multiple planning target volumes (PTVs) in different prescription patterns were studied. The model was typically trained from scratch with weights randomly initialized rather than using transfer‐learning method, and used to predict new patient's 3D dose distributions. The predictive accuracy was evaluated with three methods: (a) The dose difference at the position r, δ(r, r) = Dc(r) − Dp(r), was calculated for each voxel. The mean (μδ(r,r)) and standard deviation (σδ(r,r)) of δ(r, r) were calculated to assess the prediction bias and precision; (b) The mean absolute differences of dosimetric indexes (DIs) including maximum and mean dose, homogeneity index, conformity index, and dose spillage for PTVs and organ at risks (OARs) were calculated and statistically analyzed with the paired‐samples t test; (c) Dice similarity coefficients (DSC) between predicted and clinical isodose volumes were calculated. Results The U‐ResNet‐D model predicted 3D dose distribution accurately. For twenty tested patients, the prediction bias ranged from −2.0% to 2.3% and prediction error varied from 1.5% to 4.5% (relative to prescription) for 3D dose differences. The mean absolute dose differences for PTVs and OARs are within 2.0% and 4.2%, and nearly all the DIs for PTVs and OARs had no significant differences. The averaged DSC ranged from 0.95 to 1 for different isodose volumes. Conclusions The study developed a new deep learning method for 3D voxel‐by‐voxel dose prediction, and shown to be able to produce accurately dose predictions for nasopharyngeal patients treated by HT. The predicted 3D dose map can be useful for improving radiotherapy planning design, ensuring plan quality and consistency, making clinical technique comparison, and guiding automatic treatment planning.
Multiple view 3D video reconstruction of actor performance captures a level-of-detail for body and clothing movement which is time-consuming to produce using existing animation tools. In this paper we present a framework for concatenative synthesis from multiple 3D video sequences according to user constraints on movement, position and timing. Multiple 3D video sequences of an actor performing different movements are automatically constructed into a surface motion graph which represents the possible transitions with similar shape and motion between sequences without unnatural movement artifacts. Shape similarity over an adaptive temporal window is used to identify transitions between 3D video sequences. Novel 3D video sequences are synthesized by finding the optimal path in the surface motion graph between user specified key-frames for control of movement, location and timing. The optimal path which satisfies the user constraints whilst minimizing the total transition cost between 3D video sequences is found using integer linear programming. Results demonstrate that this framework allows flexible production of novel 3D video sequences which preserve the detailed dynamics of the captured movement for an actress with loose clothing and long hair without visible artifacts.
We present a novel hybrid representation for character animation from 4D Performance Capture (4DPC) data which combines skeletal control with surface motion graphs. 4DPC data are temporally aligned 3D mesh sequence reconstructions of the dynamic surface shape and associated appearance from multiple view video. The hybrid representation supports the production of novel surface sequences which satisfy constraints from user specified key-frames or a target skeletal motion. Motion graph path optimisation concatenates fragments of 4DPC data to satisfy the constraints whilst maintaining plausible surface motion at transitions between sequences. Spacetime editing of the mesh sequence using a learnt part-based Laplacian surface deformation model is performed to match the target skeletal motion and transition between sequences. The approach is quantitatively evaluated for three 4DPC datasets with a variety of clothing styles. Results for key-frame animation demonstrate production of novel sequences which satisfy constraints on timing and position of less than 1% of the sequence duration and path length. Evaluation of motion capture driven animation over a corpus of 130 sequences shows that the synthesised motion accurately matches the target skeletal motion. The combination of skeletal control with the surface motion graph extends the range and style of motion which can be produced whilst maintaining the natural dynamics of shape and appearance from the captured performance.
Purpose The purpose of this study is to develop a deep learning (DL) method for producing four‐dimensional computed tomography (4DCT) ventilation imaging and to evaluate the accuracy of the DL‐based ventilation imaging against single‐photon emission‐computed tomography (SPECT) ventilation imaging (SPECT‐VI). The performance of the DL‐based method is assessed by comparing with density change‐ and Jacobian‐based (HU and JAC) methods. Materials and methods Fifty patients with esophagus or lung cancer who underwent thoracic radiotherapy were enrolled in this study. For each patient, 4DCT scans paired with 99mTc‐Technegas SPECT/CT were acquired before the first radiotherapy treatment. 4DCT and SPECT/CT were first rigidly registered using MIMvista and converted to data matrix using MATLAB, and then transferred to a DL model based on U‐net for correlating 4DCT features and SPECT‐VI. Two forms of 4DCT dataset [(a) ten phases and (b) two phases of peak‐exhalation and peak‐inhalation] as input are studied. Tenfold cross‐validation procedure was used to evaluate the performance of the DL model. For comparative evaluation, HU and JAC methodologies are used to calculate specific ventilation imaging based on 4DCT (CTVI) for each patient. The voxel‐wise Spearman’s correlation was evaluated over the whole lung between each of CTVI and corresponding SPECT‐VI. The SPECT‐VI and produced CTVIs were segmented into high, median, and low functional lung (HFL, MFL, and LFL) regions. The spatial overlap of corresponding HFL, MFL, and LFL for each CTVI against SPECT‐VI was also evaluated using the dice similarity coefficient (DSC). The averaged DSC of functional lung regions was calculated and statistically analyzed with a one‐factor ANONA model among different methods. Results The voxel‐wise Spearman rs values were (0.22 ± 0.31), (−0.09 ± 0.18), and (0.73 ± 0.16)/(0.71 ± 0.17) for the CTVIHU, CTVIJAC, and CTVIDL(1)/CTVIDL(2). These results showed the DL method yielded the strongest correlation with SPECT‐VI. Using the DSC as the spatial overlap metric, we found that the CTVIHU, CTVIJAC, and CTVIDL(1)/CTVIDL(2) methods achieved averaged DSC values for all patients to be (0.45 ± 0.08), (0.33 ± 0.04), and (0.73 ± 0.09)/(0.71 ± 0.09), respectively. The results demonstrated that the DL method yielded the highest similarity with SPECT‐VI with the prominently significant difference (P < 10−7). Conclusions This study developed a DL method for producing CTVI and performed a validation against SPECT‐VI. The results demonstrated that DL method can derive CTVI with greatly improved accuracy in comparison to HU and JAC methods. The produced ventilation images can be more accurate and useful for lung functional avoidance radiotherapy and treatment response modeling.
In this paper we consider the problem of aligning multiple non-rigid surface mesh sequences into a single temporally consistent representation of the shape and motion. A global alignment graph structure is introduced which uses shape similarity to identify frames for inter-sequence registration. Graph optimisation is performed to minimise the total non-rigid deformation required to register the input sequences into a common structure. The resulting global alignment ensures that all input sequences are resampled with a common mesh structure which preserves the shape and temporal correspondence. Results demonstrate temporally consistent representation of several public databases of mesh sequences for multiple people performing a variety of motions with loose clothing and hair.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.