Motivation: Next-generation amplicon sequencing is a powerful tool for investigating microbial communities. One main challenge is to distinguish true biological variants from errors caused by PCR and sequencing. In the traditional analysis pipeline, such errors are eliminated by clustering reads within a sequence similarity threshold, usually 97%, and constructing operational taxonomic units, but the arbitrary threshold leads to low resolution and high false positive rates. Recently developed "denoising" methods have proven able to resolve single-nucleotide amplicon variants, but they still miss low frequency sequences, especially those near abundant variants, because they ignore the sequencing quality information.Results: We introduce AmpliCI, a reference-free, model-based method for rapidly resolving the number, abundance and identity of error-free sequences in massive Illumina amplicon datasets. AmpliCI takes into account quality information and allows the data, not an arbitrary threshold or an external database, to drive conclusions. AmpliCI estimates a finite mixture model, using a greedy strategy to gradually select error-free sequences and approximately maximize the likelihood. We show that AmpliCI is superior to three popular denoising methods, with acceptable computation time and memory usage.Availability: Source code available at https://github.com/DormanLab/AmpliCIThe utility of biomarkers is degraded by sequencing errors, PCR amplification errors, and intrastrain/species-specific variability [1]. To account for these factors, a typical first step of microbiome analysis is to resolve the data into Operational Taxonomic Units (OTUs), or clusters of sequences with 97% or greater similarity. There are many methods for identifying OTUs [2], roughly classifiable into closed-reference methods, which use a reference database of known organisms, or de novo methods.However, when applied to mock communities, it is widely found that both types of methods cannot accurately identify true OTUs in a sample [3,4,5,6,7,8].OTUs are problematic entities, lacking both biological and physical interpretability. They only roughly correspond to biological species, genera or higher taxonomic entities, and they do not correspond to true, error-free sequences in the sample. Thus, OTU-based methods are prone to both false positives and negatives, reporting error sequences as OTUs and missing subtle and real biological sequence variation, such as SNPs. The 97% threshold, motivated by empirical studies [9, 10], fails to reliably achieve genus or species level resolution [11,12]. There are distinct species with 97% or more similar 16S rRNA [13,14], and strains whose 16S rRNA locally differ by more than 3% [15].Amplicon sequencing data from current Illumina platforms support de novo single-nucleotide resolution [16]. Modern methods attempt to identify all the unique sequences in the sample [17,18,19,20,16,21,22,23]. Such denoising methods make no biological judgment on taxonomic entities, but simply remove or correct sequences produc...