Mining frequent itemsets in a datastream proves to be a difficult problem, as itemsets arrive in rapid succession and storing parts of the stream is typically impossible. Nonetheless, it has many useful applications; e.g., opinion and sentiment analysis from social networks. Current stream mining algorithms are based on approximations. In earlier work, mining frequent items in a stream under the max-frequency measure proved to be effective for items. In this paper, we extended our work from items to itemsets. Firstly, an optimized incremental algorithm for mining frequent itemsets in a stream is presented. The algorithm maintains a very compact summary of the stream for selected itemsets. Secondly, we show that further compacting the summary is nontrivial. Thirdly, we establish a connection between the size of a summary and results from number theory. Fourthly, we report results of extensive experimentation, both of synthetic and real-world datasets, showing the efficiency of the algorithm both in terms of time and space.
The impact of input frequency (IF) and functional load (FL) of segments in the ambient language on the acquisition order of word-initial consonants is investigated. Several definitions of IF/FL are compared and implemented. The impact of IF/FL and their components are computed using a longitudinal corpus of interactions between thirty Dutch-speaking children (age range: 0 ; 6–2 ; 0) and their primary caretaker(s). The corpus study reveals significant correlations between IF/FL and acquisition order. The highest predictive values are found for the token frequency of segments, and for FL computed on minimally different word types in child-directed speech. Although IF and FL significantly correlate, they do have a different impact on the order of acquisition of word-initial consonants. When the impact of IF is partialed out, FL still has a significant correlation with acquisition order. The reverse is not true, suggesting that the acquisition of word-initial consonants is mainly influenced by their discriminating function.
Our goal is to better understand, at a theoretical level, the database aspects of DNA computing. Thereto, we introduce a formally defined data model of so-called sticker DNA complexes, suitable for the representation and manipulation of structured data in DNA. We also define DNAQL, a restricted programming language over sticker DNA complexes. DNAQL stands to general DNA computing as the standard relational algebra for relational databases stands to general-purpose conventional computing. The number of operations performed during the execution of a DNAQL program, on any input, is only polynomial in the dimension of the data, i.e., the number of bits needed to represent a single data entry. Moreover, each operation can be implemented in DNA using a constant number of laboratory steps. We prove that the relational algebra can be simulated in DNAQL.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.