what are called Monte Carlo calculations. Such calculations depend on having available sequences of numbers which appear to be drawn at random from particular probability distributions. For convenience we will refer to any such numbers simply as random numbers.Our purpose is to survey the problem of obtaining these sequences of numbers, with particular emphasis on the procedures used for their generation on storedprogram computers. The term "pseudo-random" is often used to describe the random numbers which are obtained on computers.We begin in section 2 with a brief indication of what types of calculations require such a supply of random numbers.Then in section 3 we turn to the main topic, which is a thorough treatment of the number theoretic properties of the methods of generation called "mixed congruential", followed for comparison by a brief treatment of the older "multiplicative congruential" methods. We find that the former have several theoretical advantages over the latter. We also refer briefly to some recent theoretical results concerning the serial correlation of the generated sequences.In section 4 we consider some of the statistical properties which must also be required of these sequences. Here the mixed methods lose some of their attractiveness. Under certain circumstances they can produce sequences which fail to pass the required tests. On the other hand the multiplicative methods produce sequences with consistently good statistical properties.In the final three sections we summarize other aspects of the subject. In section 5 we consider the problem of using numbers from the uniform distribution to obtain numbers from various other distributions. In section 6 we draw attention to several problems which seem to warrant further study. Finally, in section 7, we describe some of the historical development of the subject, and here we refer to other methods, and to other approaches to the problem.The bibliography is intended to be complete with respect to references concerned with the generation of random numbers on computers. It contains substantially all such references to the open literature, as well as many references to government and company reports. In addition it contains a number of references concerning each of the related topics which are considered in this paper. Generally speaking, Monte Carlo methods have not been particularly successful when applied to these less natural situations. However, they are indispensable in most of thoe natural applications, where there is often no alternative procedure. CALCULATIONS REQUIRING RANDOM NUMBERSA characteristic of the Monte Carlo method is that the required solution is approached with an error which is 0(n112), where n is the number of trials,
Citizen science initiatives encourage volunteer participants to collect and interpret data and contribute to formal scientific projects. The growth of virtual citizen science (VCS), facilitated through websites and mobile applications since the mid-2000s, has been driven by a combination of software innovations and mobile technologies, growing scientific data flows without commensurate increases in resources to handle them, and the desire of internet-connected participants to contribute to collective outputs. However, the increasing availability of internet-based activities requires individual VCS projects to compete for the attention of volunteers and promote their long-term retention. We examined program and platform design principles that might allow VCS initiatives to compete more effectively for volunteers, increase productivity of project participants, and retain contributors over time. We surveyed key personnel engaged in managing a sample of VCS projects to identify the principles and practices they pursued for these purposes and led a team in a heuristic evaluation of volunteer engagement, website or application usability, and participant retention. We received 40 completed survey responses (33% response rate) and completed a heuristic evaluation of 20 VCS program sites. The majority of the VCS programs focused on scientific outcomes, whereas the educational and social benefits of program participation, variables that are consistently ranked as important for volunteer engagement and retention, were incidental. Evaluators indicated usability, across most of the VCS program sites, was higher and less variable than the ratings for participant engagement and retention. In the context of growing competition for the attention of internet volunteers, increased attention to the motivations of virtual citizen scientists may help VCS programs sustain the necessary engagement and retention of their volunteers.
Random number generators of the mixed eongruential type have recently been proposed. They appear to have some advantages over those of the multiplieative congruential type, but they have not been thoroughly tested. This paper summarizes the results of extensive testing of these generators which has been carried out on a decimal machine. Most results are for word length 10, and special attention is given to simple multipliers which give fast generators. But other word lengths and many other multipliers are considered. A variety of t~dditive constants is also used. It turns out that these mixed generators, in contrast to the multiplieative ones, are not consistently good from a statistical point of view. The cases which are bad seem to belong to a well-defined class which, unfortunately, includes most of the generators associated with the simple multipliers. However, a surprise result is that all generators associated with one of the simplest and fastest multipliers, namely 101, turn out to be consistently good for word lengths greater than seven digits. A final section of the paper suggests a simple theoretical explanation of these experimental results. Generators To Be TestedAlmost all random number generators that are used in practice can be obtained as special cases of the following procedure. One begins with the non-negative integers x0, a, c, and m, where m is the largest. Then one defines xl, x2, ... to be the non-negative integers less than m generated by xi+l-= ax~+c (modm) i = 1,2, -.-Finally, the sequence xo/m, x~/m, •. • is taken to be the sequence of random numbers. Tile hope is that the parameters x0, a, c, and m have been chosen so that the resulting sequence appears to be drawn at random from the uniform distribution on [0, 1]. Such a sequence is often called "pseudo-random." A general treatment of these generators is given in [6], along with an extensive bibliography. In practice it is especially convenient to choose m according to the particular computer being used, e.g. 101° or 2 ~5, and then the problem is to choose the remaining parameters x0, a, and c so that the period of the resulting sequence is as great as possible. Finally, one tests subsequences to see if they appear to be random.Until recently the only cases considered were those associated with choosing e = 0. Corresponding generators arc called "multiplieative" congruential generators. By choosing x, and a according to certain specifications [6], one can ensure that the resulting sequences are as long as is possible for this case, c = 0. For example, the maximal periods are 5 X 10 s when m = 10 ~°, and 23s when m = 2 sh. Using m = 10 r" on a decimal machine, the fastest generators yielding maximal •
Abst~'act. Random number generators of the mixed congruential type have recently been proposed. They appear to have some advantages over those of the multiplicative type, except that their statistical behavior is unsatisfactory in some cases. It is shown theoretically that a certain class of these mixed generators should be expected to fail statistical tests for randomness. Extensive testing confirms this hypothesis and makes possible a more precise definition of the unsatisfactory class. It is concluded that the advantages of mixed generators can be realized only in special circumstances. On machines with relatively short multiplication times the multiplicative generators are to be preferred. The GeneratorsMost random number generators are based on a congruence of the formThe sequence of integers x0, xl, x2, • • • is determined by the choice of x0, a, c, and m, where these four parameters are non-negative integers, m being the largest. Then the hope is that the sequence xo/m, xl/m, x2/m, • .. will appear to be drawn at random from the uniform distribution on [0, 1]. If c = 0 the generator is "multiplicative," otherwise it is "mixed." A general discussion of these generators is given in [7] along with an extensive bibliography.The multiplicative generators have been used extensively and their statistical behavior appears generally to be satisfactory. Recently mixed generators have been proposed by Coveyou [2], Greenberger [5,6] and Rotenberg [10]. (See also [7,8,9, 11].)The mixed generators appear to have a few smM1 advantages over the multiplicative generators. One can choose a and c so that the sequence has the full period m, and this also means that any x0 may be chosen. The theory behind this result is easier than the corresponding theory about the smaller period which is best possible for multiplicative generators. Moreover, the conditions on a and c are easy to realize, and to remember: It is necessary and sufficient to have c and m relatively prime, and for a decimal machine to have a congruent to 1 (rood 20), while for a binary machine it is 1 (mod 4). But the main advantage mixed generators may offer over multiplicative ones is speed. On many machines they are faster because one can use a shift-and-add procedure in place of multiplying by a, when a is of the form l0 g + 1 on a decimal machine or 2" 't-1 on a binary machine (s ~ 2).
The huge amount of free-form unstructured text in the blogosphere, its increasing rate of production, and its shrinking window of relevance, present serious challenges to the public policy analyst who seeks to take public opinion into account. Most of the tools which address this problem use XML tagging and other Web 3.0 approaches, which do not address the actual content of blog posts and the associated commentary. We give a tutorial review of latent semantic analysis and the self-organizing maps, as considered in this context, and show how to apply the self-organizing map over a probabilistic latent semantic space to the problem of completely unsupervised clustering of unstructured text in such a way as to be entirely independent of spelling, grammar, and even source language. This provides an algorithm suitable for clustering free-form commentary with a well-structured test environment. The algorithm is applied to academic paper abstracts instead, treated as unstructured text as though they were blog posts, because this set of documents has a known ground truth. The algorithm constructs a word category map and a document map in which words with similar meaning and documents with similar content are clustered together.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.