Siu Ming Yiu scite author profile

BackgroundThe process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence). More importantly, it remains largely unclear how to best assess the quality of assembled genome sequences. The Assemblathon competitions are intended to assess current state-of-the-art methods in genome assembly.ResultsIn Assemblathon 2, we provided a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and snake). This resulted in a total of 43 submitted assemblies from 21 participating teams. We evaluated these assemblies using a combination of optical map data, Fosmid sequences, and several statistical methods. From over 100 different metrics, we chose ten key measures by which to assess the overall quality of the assemblies.ConclusionsMany current genome assemblers produced useful assemblies, containing a significant representation of their genes and overall genome structure. However, the high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another.

show abstract

Redefining the structural motifs that determine RNA binding and RNA editing by pentatricopeptide repeat proteins in land plants

Cheng

et al. 2016

View full text Add to dashboard Cite

These authors contributed equally to the manuscript. SUMMARYThe pentatricopeptide repeat (PPR) proteins form one of the largest protein families in land plants. They are characterised by tandem 30-40 amino acid motifs that form an extended binding surface capable of sequence-specific recognition of RNA strands. Almost all of them are post-translationally targeted to plastids and mitochondria, where they play important roles in post-transcriptional processes including splicing, RNA editing and the initiation of translation. A code describing how PPR proteins recognise their RNA targets promises to accelerate research on these proteins, but making use of this code requires accurate definition and annotation of all of the various nucleotide-binding motifs in each protein. We have used a structural modelling approach to define 10 different variants of the PPR motif found in plant proteins, in addition to the putative deaminase motif that is found at the C-terminus of many RNA-editing factors. We show that the super-helical RNA-binding surface of RNA-editing factors is potentially longer than previously recognised. We used the redefined motifs to develop accurate and consistent annotations of PPR sequences from 109 genomes. We report a high error rate in PPR gene models in many public plant proteomes, due to gene fusions and insertions of spurious introns. These consistently annotated datasets across a wide range of species are valuable resources for future comparative genomics studies, and an essential pre-requisite for accurate large-scale computational predictions of PPR targets. We have created a web portal (http://www.-plantppr.com) that provides open access to these resources for the community.

show abstract

Erratum: SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler

Luo

Liu

Xie

et al. 2015

GigaSci

192

123

View full text Add to dashboard Cite

SPECS: Secure and Privacy Enhancing Communications Schemes for VANETs

Chim¹,

Yiu²,

Hui³

et al. 2010

111

View full text Add to dashboard Cite

Vehicular ad hoc network (VANET) is an emerging type of networks which facilitates vehicles on roads to communicate for driving safety. The basic idea is to allow arbitrary vehicles to broadcast ad hoc messages (e.g. traffic accidents) to other vehicles. However, this raises the concern of security and privacy. Messages should be signed and verified before they are trusted while the real identity of vehicles should not be revealed, but traceable by authorized party. Existing solutions either rely heavily on a tamper-proof hardware device, or cannot satisfy the privacy requirement and do not have an effective message verification scheme. In this paper, we provide a software-based solution which makes use of only two shared secrets to satisfy the privacy requirement (with security analysis) and gives lower message overhead and at least 45% higher successful rate than previous solutions in the message verification phase using the bloom filter and the binary search techniques (through simulation study). We also provide the first group communication protocol to allow vehicles to authenticate and securely communicate with others in a group of known vehicles.

show abstract

MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample

Leung

Yiu

et al. 2012

112

View full text Add to dashboard Cite

Motivation: Metagenomic binning remains an important topic in metagenomic analysis. Existing unsupervised binning methods for next-generation sequencing (NGS) reads do not perform well on (i) samples with low-abundance species or (ii) samples (even with high abundance) when there are many extremely low-abundance species. These two problems are common for real metagenomic datasets. Binning methods that can solve these problems are desirable.Results: We proposed a two-round binning method (MetaCluster 5.0) that aims at identifying both low-abundance and high-abundance species in the presence of a large amount of noise due to many extremely low-abundance species. In summary, MetaCluster 5.0 uses a filtering strategy to remove noise from the extremely low-abundance species. It separate reads of high-abundance species from those of low-abundance species in two different rounds. To overcome the issue of low coverage for low-abundance species, multiple w values are used to group reads with overlapping w-mers, whereas reads from high-abundance species are grouped with high confidence based on a large w and then binning expands to low-abundance species using a relaxed (shorter) w. Compared to the recent tools, TOSS and MetaCluster 4.0, MetaCluster 5.0 can find more species (especially those with low abundance of say 6× to 10×) and can achieve better sensitivity and specificity using less memory and running time.Availability: http://i.cs.hku.hk/~alse/MetaCluster/Contact: chin@cs.hku.hk

show abstract

VSPN: VANET-Based Secure and Privacy-Preserving Navigation

Chim

Yiu

Hui

et al. 2014

IEEE Trans. Comput.

150

View full text Add to dashboard Cite

In this paper, we propose a navigation scheme that utilizes the online road information collected by a vehicular ad hoc network (VANET) to guide the drivers to desired destinations in a real-time and distributed manner. The proposed scheme has the advantage of using real-time road conditions to compute a better route and at the same time, the information source can be properly authenticated. To protect the privacy of the drivers, the query (destination) and the driver who issues the query are guaranteed to be unlinkable to any party including the trusted authority. We make use of the idea of anonymous credential to achieve this goal. In addition to authentication and privacy-preserving, our scheme fulfills all other necessary security requirements. Using the real maps of New York and California, we conducted a simulation study on our scheme showing that it is effective in terms of processing delay and providing routes of much shorter travelling time.

show abstract

SPECS: Secure and privacy enhancing communications schemes for VANETs

et al. 2011

View full text Add to dashboard Cite

show abstract

Security Issues and Challenges for Cyber Physical System

Wang

et al. 2010

152

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Siu Ming Yiu

Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species

Redefining the structural motifs that determine RNA binding and RNA editing by pentatricopeptide repeat proteins in land plants

Erratum: SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler

SPECS: Secure and Privacy Enhancing Communications Schemes for VANETs

MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample

VSPN: VANET-Based Secure and Privacy-Preserving Navigation

SPECS: Secure and privacy enhancing communications schemes for VANETs

Security Issues and Challenges for Cyber Physical System

Contact Info

Product

Resources

About