Ying Xu scite author profile

A new method for fold recognition is developed and added to the general protein structure prediction package PROSPECT (http://compbio.ornl.gov/PROSPECT/). The new method (PROSPECT II) has four key features. (i) We have developed an efficient way to utilize the evolutionary information for evaluating the threading potentials including singleton and pairwise energies. (ii) We have developed a two-stage threading strategy: (a) threading using dynamic programming without considering the pairwise energy and (b) fold recognition considering all the energy terms, including the pairwise energy calculated from the dynamic programming threading alignments. (iii) We have developed a combined z-score scheme for fold recognition, which takes into consideration the z-scores of each energy term. (iv) Based on the z-scores, we have developed a confidence index, which measures the reliability of a prediction and a possible structure-function relationship based on a statistical analysis of a large data set consisting of threadings of 600 query proteins against the entire FSSP templates. Tests on several benchmark sets indicate that the evolutionary information and other new features of PROSPECT II greatly improve the alignment accuracy. We also demonstrate that the performance of PROSPECT II on fold recognition is significantly better than any other method available at all levels of similarity. Improvement in the sensitivity of the fold recognition, especially at the superfamily and fold levels, makes PROSPECT II a reliable and fully automated protein structure and function prediction program for genome-scale applications.

show abstract

Estimating corpus size via queries

Bröder

Fontura

Josifovski

et al. 2006

View full text Add to dashboard Cite

We consider the problem of estimating the size of a collection of documents using only a standard query interface. Our main idea is to construct an unbiased and low-variance estimator that can closely approximate the size of any set of documents defined by certain conditions, including that each document in the set must match at least one query from a uniformly sampleable query pool of known size, fixed in advance.Using this basic estimator, we propose two approaches to estimating corpus size. The first approach requires a uniform random sample of documents from the corpus. The second approach avoids this notoriously difficult sample generation problem, and instead uses two fairly uncorrelated sets of terms as query pools; the accuracy of the second approach depends on the degree of correlation among the two sets of terms.Experiments on a large TREC collection and on three major search engines demonstrates the effectiveness of our algorithms.

show abstract

Link privacy in social networks

Korolova

Motwani

Nabar

et al. 2008

View full text Add to dashboard Cite

We consider a privacy threat to a social network in which the goal of an attacker is to obtain knowledge of a significant fraction of the links in the network. We formalize the typical social network interface and the information about links that it provides to its users in terms of lookahead. We consider a particular threat where an attacker subverts user accounts to get information about local neighborhoods in the network and pieces them together in order to get a global picture. We analyze, both experimentally and theoretically, the number of user accounts an attacker would need to subvert for a successful attack, as a function of his strategy for choosing users whose accounts to subvert and a function of lookahead provided by the network. We conclude that such an attack is feasible in practice, and thus any social network that wishes to protect the link privacy of its users should take great care in choosing the lookahead of its interface, limiting it to 1 or 2, whenever possible.

show abstract

Link Privacy in Social Networks

Korolova

Motwani

Nabar

et al. 2008

View full text Add to dashboard Cite

show abstract

Stochastic Kronecker Graphs

Mahdian

View full text Add to dashboard Cite

A random graph model based on Kronecker products of probability matrices has been recently proposed as a generative model for large-scale real-world networks such as the web. This model simultaneously captures several well-known properties of real-world networks; in particular, it gives rise to a heavy-tailed degree distribution, has a low diameter, and obeys the densification power law. Most properties of Kronecker products of graphs (such as connectivity and diameter) are only rigorously analyzed in the deterministic case. In this article, we study the basic properties of stochastic Kronecker products based on an initiator matrix of size two (which is the case that is shown to provide the best fit to many real-world networks). We will show a phase transition for the emergence of the giant component and another phase transition for connectivity, and prove that such graphs have constant diameters beyond the connectivity threshold, but are not searchable using a decentralized algorithm.

show abstract

A computational method for assessing peptide-identification reliability in tandem mass spectrometry analysis with SEQUEST

Razumovskaya

Olman

et al.

View full text Add to dashboard Cite

Efficient Theoretic and Practical Algorithms for Linear Matroid Intersection Problems

Gabow

1996

Journal of Computer and System Sciences

View full text Add to dashboard Cite

Efficient algorithms for the matroid intersection problem, both cardinality and weighted versions, are presented. The algorithm for weighted intersection works by scaling the weights. The cardinality algorithm is a special case, but takes advantage of greater structure. Efficiency of the algorithms is illustrated by several implementations on linear matroids. Consider a linear matroid with m elements and rank n. Assume all element weights are integers of magnitude at most N. Our fastest algorithms use time O(mn 1.77 log(nN)) and O(mn 1.62 ) for weighted and unweighted intersection, respectively; this improves the previous best bounds, O(mn 2.4 ) and O(mn 2 log n), respectively. Corresponding improvements are given for several applications of matroid intersection to numerical computation and dynamic systems.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ying Xu

Haplotype inference by maximum parsimony

PROSPECT II: protein structure prediction program for genome-scale applications

Estimating corpus size via queries

Link privacy in social networks

Link Privacy in Social Networks

Stochastic Kronecker Graphs

A computational method for assessing peptide-identification reliability in tandem mass spectrometry analysis with SEQUEST

Efficient Theoretic and Practical Algorithms for Linear Matroid Intersection Problems

Contact Info

Product

Resources

About