Motivation:In this work we present REINDEER, a novel computational method that performs indexing of sequences and records their abundances across a collection of datasets. To the best of our knowledge, other indexing methods have so far been unable to record abundances efficiently across large datasets.Results: We used REINDEER to index the abundances of sequences within 2,585 human RNA-seq experiments in 45 hours using only 56 GB of RAM. This makes REINDEER the first method able to record abundances at the scale of ∼4 billion distinct k-mers across 2,585 datasets. REINDEER also supports exact presence/absence queries of k-mers. Briefly, REINDEER constructs the compacted de Bruijn graph (DBG) of each dataset, then conceptually merges those DBGs into a single global one. Then, REINDEER constructs and indexes monotigs, which in a nutshell are groups of k-mers of similar abundances.Availability: https://github.com/kamimrcht/REINDEERWe also highlight that merely adapting existing data structures by transforming the 1-bit presence/absence information into a (e.g. 16-bit) counter is unlikely to be a viable strategy. For instance, consider the HowDeSBT data structure [9], a recent technique for indexing the presence/absence of k-mers across dataset collections. It saves space by using a single memory location to encode the presence of a k-mer across multiple datasets. Yet this scheme cannot be adapted to record abundances, as a k-mer may be present in multiple datasets at different abundances, which cannot all be recorded by a single memory location. Likewise, BIGSI [12] uses Bloom filters with 25% false positive rate to encode presence/absence of k-mers; extending Bloom filters to support abundance queries (e.g. using Count-Min sketches) at a comparable false positive rate would possibly introduce significant abundance estimation errors.Here we introduce REINDEER (REad Index for abuNDancE quERy), a novel computational method that performs indexing of k-mers and records their counts across a collection of datasets. REINDEER uses a combination of several concepts. The first novelty is to associate k-mers to their counts within datasets, instead of only recording the presence/absence of k-mers as is nearly universally done in previous works. To achieve this, a second novelty is the introduction of monotigs, which allows space-efficient grouping of k-mers having similar count profiles across datasets. An additional contribution is a set of techniques to further save space: discretization and compression of counts, on-disk row de-duplication algorithm of the count matrix. As a proof of concept, in this article we apply REINDEER to index a de facto benchmark collection of 2,585 human RNA-seq datasets, and provide relevant performance metrics. We further illustrate its utility by showing the results of queries on four oncogenes and three tumor suppressor genes within this collection.