Development of noninvasive methods for the diagnosis of transitional cell carcinoma (TCC) of the bladder remains a challenge. A ProteinChip technology (sur-faceBladder cancer is the second most common genitourinary malignancy accounting for ϳ5% of all newly diagnosed cancers in the United States. 1 More than 90% are of the transitional cell carcinoma (TCC) histology. 2 At present, the most reliable way of diagnosis and surveillance of TCC is by cystoscopic examination and bladder biopsy for histological confirmation. The invasive and labor-intensive nature of this procedure presents a challenge to develop better, less costly, and noninvasive diagnostic tools. Urine cytology has for many years been the gold standard of the noninvasive approaches. It has high specificity and provides the advantage over biopsy of screening the entire urothelium. 2,3 However, its high false-negative rate, particularly for low-grade tumors, has limited its use as an adjunct to cystoscopy.Many noninvasive molecular diagnostic tests have been developed based on an ever-increasing knowledge about the molecular alterations associated with bladder cancer pathogenesis. The bladder tumor antigen, 4 the bladder tumor antigen stat, 5 the fibrinogen/fibrin degradation products, 6 and the nuclear matrix protein-22 tests, 3,7 have been approved by the Food and Drug Administration to be used in conjunction with cystoscopy. Additional molecular assays currently being evaluated for their diagnostic/prognostic utility 2,3,8,9 are the Telomerase, 10
With recent advances in mass spectrometry techniques, it is now possible to investigate proteins over a wide range of molecular weights in small biological specimens. This advance has generated data-analytic challenges in proteomics, similar to those created by microarray technologies in genetics, namely, discovery of 'signature' protein profiles specific to each pathologic state (e.g. normal vs. cancer) or differential profiles between experimental conditions (e.g. treated by a drug of interest vs. untreated) from high-dimensional data. We propose a data-analytic strategy for discovering protein biomarkers based on such high-dimensional mass spectrometry data. A real biomarker-discovery project on prostate cancer is taken as a concrete example throughout the paper: the project aims to identify proteins in serum that distinguish cancer, benign hyperplasia, and normal states of prostate using the Surface Enhanced Laser Desorption/Ionization (SELDI) technology, a recently developed mass spectrometry technique. Our data-analytic strategy takes properties of the SELDI mass spectrometer into account: the SELDI output of a specimen contains about 48,000 (x, y) points where x is the protein mass divided by the number of charges introduced by ionization and y is the protein intensity of the corresponding mass per charge value, x, in that specimen. Given high coefficients of variation and other characteristics of protein intensity measures (y values), we reduce the measures of protein intensities to a set of binary variables that indicate peaks in the y-axis direction in the nearest neighborhoods of each mass per charge point in the x-axis direction. We then account for a shifting (measurement error) problem of the x-axis in SELDI output. After this pre-analysis processing of data, we combine the binary predictors to generate classification rules for cancer, benign hyperplasia, and normal states of prostate. Our approach is to apply the boosting algorithm to select binary predictors and construct a summary classifier. We empirically evaluate sensitivity and specificity of the resulting summary classifiers with a test dataset that is independent from the training dataset used to construct the summary classifiers. The proposed method performed nearly perfectly in distinguishing cancer and benign hyperplasia from normal. In the classification of cancer vs. benign hyperplasia, however, an appreciable proportion of the benign specimens were classified incorrectly as cancer. We discuss practical issues associated with our proposed approach to the analysis of SELDI output and its application in cancer biomarker discovery.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.