Over the past few decades correspondence analysis has gained an international reputation as a powerful statistical tool for the graphical analysis of contingency tables. This popularity stems from its development and application in many European countries, especially France, and its use has spread to English speaking nations such as the United States and the United Kingdom. Its growing popularity amongst statistical practitioners, and more recently those disciplines where the role of statistics is less dominant, demonstrates the importance of the continuing research and development of the methodology.The aim of this paper is to highlight the theoretical, practical and computational issues of simple correspondence analysis and discuss its relationship with recent advances that can be used to graphically display the association in two-way categorical data.
A method is developed that caters for the application of correspondence analysis to two-way contingency tables with one and two ordered sets of categories. The method involves calculating orthogonal polynomials of the type described by EMERSON (1968). and partitioning the chi-square statistic using the method described in LANCASTER (1953). The method has all the features of simple correspondence analysis, although allows for additional information about the structure and association of the data to be made by isolating location, dispersion and higher order components of the rows and columns.
Ecological analysis involves using aggregate data for a set of groups to make inferences concerning individual level relationships. Typically the data available for analysis consists of the means or totals of variables of interest for geographical areas, although the groups can be organisations such as schools or hospitals. Attention has focused on developing methods of estimating the parameters characterising the individual level relationships across the whole population, but also in some cases the relationships for each of the groups.Applying standard methods used to analyse individual level data, such as linear or logistic regression or contingency table analysis, to aggregate data will usually produce biased estimates of individual level relationships. Thus much of the effort in ecological analysis has concentrated on developing methods of analysing aggregate data that can produce unbiased, or less biased, parameter estimates. There has been less work done on inference procedures, such as constructing confidence intervals and hypothesis testing. Fundamental to these inferential issues is the question of how much information is contained in aggregate data and what evidence such data can provide concerning important assumptions and hypotheses.
S3 RI Methodology Working Paper M03/14
Parameter estimation for association and log-linear models is an important aspect of the analysis of cross-classified categorical data. Classically, iterative procedures, including Newton's method and iterative scaling, have typically been used to calculate the maximum likelihood estimates of these parameters. An important special case occurs when the categorical variables are ordinal and this has received a considerable amount of attention for more than 20 years. This is because models for such cases involve the estimation of a parameter that quantifies the linear-by-linear association and is directly linked with the natural logarithm of the common odds ratio. The past five years has seen the development of non-iterative procedures for estimating the linear-by-linear parameter for ordinal log-linear models. Such procedures have been shown to lead to numerically equivalent estimates when compared with iterative, maximum likelihood estimates. Such procedures also enable the researcher to avoid some of the computational difficulties that commonly arise with iterative algorithms. This paper investigates and evaluates the performance of three non-iterative procedures for estimating this parameter by considering 14 contingency tables that have appeared in the statistical and allied literature. The estimation of the standard error of the association parameter is also considered.
We argue that making accept/reject decisions on scientific hypotheses, including a recent call for changing the canonical alpha level from p = 0.05 to p = 0.005, is deleterious for the finding of new discoveries and the progress of science. Given that blanket and variable alpha levels both are problematic, it is sensible to dispense with significance testing altogether. There are alternatives that address study design and sample size much more directly than significance testing does; but none of the statistical tools should be taken as the new magic method giving clear-cut mechanical answers. Inference should not be based on single studies at all, but on cumulative evidence from multiple independent studies. When evaluating the strength of the evidence, we should consider, for example, auxiliary assumptions, the strength of the experimental design, and implications for applications. To boil all this down to a binary decision based on a p-value threshold of 0.05, 0.01, 0.005, or anything else, is not acceptable.
The paper presents a partition of the Pearson chi-squared statistic for triply ordered three-way contingency tables. The partition invokes orthogonal polynomials and identifies three-way association terms as well as each combination of two-way associations. This partition provides information about the structure of each variable by identifying important bivariate and trivariate associations in terms of location (linear), dispersion (quadratic) and higher order components. The significance of each term in the partition, and each association within each term can also be determined.The paper compares the chi-squared partition with the log-linear models of Agresti (1994) for multi-way contingency tables with ordinal categories, by generalizing the model proposed by Haberman (1974).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.