Data integration to align cells across batches has become a cornerstone of most single cell analysis pipelines, critically affecting downstream analyses. Yet, how much signal is erased from data during integration? Currently, there are no guidelines for when biological signals are separable from batch effects in single cell studies, and thus, studies usually take a black-box, trial-and-error approach towards batch integration. We show evidence that current paradigms for single cell data integration are unnecessarily aggressive, removing biologically meaningful variation. To remedy this, we present a novel statistical model and computationally scalable algorithm, CellANOVA, to recover biological signal that is lost during single cell data integration. CellANOVA utilizes a “pool-of-controls” design concept, applicable across diverse settings, to separate unwanted variation from biological variation of interest. When applied with existing integration methods, CellANOVA allows the recovery of subtle biological signals and corrects, to a large extent, the data distortion introduced by integration. Further, CellANOVA explicitly estimates cell- and gene-specific batch effect terms which can be used to identify the cell types and pathways exhibiting the largest batch variations, providing clarity as to which biological signals can be recovered.