David I. Warton scite author profile

Fitting a line to a bivariate dataset can be a deceptively complex problem, and there has been much debate on this issue in the literature. In this review, we describe for the practitioner the essential features of line-fitting methods for estimating the relationship between two variables : what methods are commonly used, which method should be used when, and how to make inferences from these lines to answer common research questions.A particularly important point for line-fitting in allometry is that usually, two sources of error are present (which we call measurement and equation error), and these have quite different implications for choice of linefitting method. As a consequence, the approach in this review and the methods presented have subtle but important differences from previous reviews in the biology literature.Linear regression, major axis and standardised major axis are alternative methods that can be appropriate when there is no measurement error. When there is measurement error, this often needs to be estimated and used to adjust the variance terms in formulae for line-fitting. We also review line-fitting methods for phylogenetic analyses.Methods of inference are described for the line-fitting techniques discussed in this paper. The types of inference considered here are testing if the slope or elevation equals a given value, constructing confidence intervals for the slope or elevation, comparing several slopes or elevations, and testing for shift along the axis amongst several groups. In some cases several methods have been proposed in the literature. These are discussed and compared. In other cases there is little or no previous guidance available in the literature.Simulations were conducted to check whether the methods of inference proposed have the intended coverage probability or Type I error. We identified the methods of inference that perform well and recommend the techniques that should be adopted in future work.

show abstract

The arcsine is asinine: the analysis of proportions in ecology

Warton

Hui

2011

Ecology

1,906

1,311

View full text Add to dashboard Cite

Abstract. The arcsine square root transformation has long been standard procedure when analyzing proportional data in ecology, with applications in data sets containing binomial and non-binomial response variables. Here, we argue that the arcsine transform should not be used in either circumstance. For binomial data, logistic regression has greater interpretability and higher power than analyses of transformed data. However, it is important to check the data for additional unexplained variation, i.e., overdispersion, and to account for it via the inclusion of random effects in the model if found. For non-binomial data, the arcsine transform is undesirable on the grounds of interpretability, and because it can produce nonsensical predictions. The logit transformation is proposed as an alternative approach to address these issues. Examples are presented in both cases to illustrate these advantages, comparing various methods of analyzing proportions including untransformed, arcsine-and logit-transformed linear models and logistic regression (with or without random effects). Simulations demonstrate that logistic regression usually provides a gain in power over other methods.

show abstract

Cross‐validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure

et al. 2017

View full text Add to dashboard Cite

913 may also lead to dependence between species (phylogenetic structure) or populations of species (genetic structure) with more recent divergence will tend to be more similar than those which diverged longer ago (Harvey and Pagel 1991). While such underlying structures in the data are not fundamentally problematic for statistical analyses, they tend to create two undesirable outcomes. First, model error, as well as neglected processes and variables connected to these structures, often leads to dependence structures in the model residuals, which violates the critical assumption of independence present in many models and methods (Legendre and Fortin 1989, Miller et al. 2007). Second, because predictor variables are often correlated with underlying dependence structures (e.g. climate with space), models may use predic-tors to overfit the residual dependence structure and thereby remove it, partially or completely.

show abstract

mvabund– an R package for model‐based analysis of multivariate abundance data

Wang¹,

Naumann²,

Wright³

et al. 2012

Methods Ecol Evol

1,245

1,115

View full text Add to dashboard Cite

show abstract

smatr 3– an R package for estimation and inference about allometric lines

et al. 2011

View full text Add to dashboard Cite

Summary1. The Standardised Major Axis Tests and Routines (SMATR) software provides tools for estimation and inference about allometric lines, currently widely used in ecology and evolution. 2. This paper describes some significant improvements to the functionality of the package, now available on R in smatr version 3. 3. New inclusions in the package include sma and ma functions that accept formula input and perform the key inference tasks; multiple comparisons; graphical methods for visualising data and checking (S)MA assumptions; robust (S)MA estimation and inference tools.Key-words: common slope testing, model II regression, principal component analysis, robust estimation, standardised major axis Biologists often wish to estimate how one variable scales against another and to test hypotheses about the nature of this relationship and how it varies across samples. The most common example of this is allometry (Reiss 1989); hence, we refer to this problem as one of estimation and testing about allometric lines. An example is given in Fig. 1a, where we wish to understand how leaf lifespan (longev) scales against leaf mass per area (lma) and how this relationship changes across sites with different rainfall (rain). longev and lma are log-transformed prior to analysis and are approximately linearly related on the transformed scale. This is common in allometry, and it means that their relationship approximately follows a power law, longev¼ alma b . The 'scaling exponent' b is the slope on log-transformed axes, and the magnitude of this parameter describes how steep the leaf lifespan-leaf mass per area relationship is. The 'proportionality coefficient' a, related to the elevation on log-log axes, is needed to understand how longlived leaves of a given mass per area will be.Estimating a and b is not a simple linear regression problem because we are not interested in predicting one variable from another -we are interested in estimating some underlying line of best fit (Warton et al., 2006 ). Another way to understand this is to see that the problem is symmetric -the basic problem does not change if we plot lma on the Y axis instead of the X axis (Smith 2009). Hence, the appropriate methods for analysis have more in common with principal component analysis, a multivariate approach, than with linear regression (Warton et al., 2006). Common approaches to estimating the line of best fit are standardised major axis (SMA) and major axis (MA) estimation, which will be collectively referred to as (S)MA, and which are widely used in ecology and evolution.Warton et al. (2006) reviewed (S)MA techniques, proposed routines for comparing the parameters a and b amongst groups and developed software to implement the methods. The Standardised Major Axis Tests and Routines (SMATR) software, available in both R (R Development Core Team 2010) and C++, has since been used in over 200 publications. We have made significant improvements to the software in the recently released version 3 of the smatr R package, and this paper briefly describe...

show abstract

So Many Variables: Joint Modeling in Community Ecology

Warton

Blanchet

O’Hara

et al. 2015

Trends in Ecology & Evolution

625

808

View full text Add to dashboard Cite

Modulation of leaf economic traits and trait relationships by climate

Wright¹,

Reich²,

Cornelissen³

et al. 2005

Global Ecology and Biogeography

711

678

View full text Add to dashboard Cite

AimOur aim was to quantify climatic influences on key leaf traits and relationships at the global scale. This knowledge provides insight into how plants have adapted to different environmental pressures, and will lead to better calibration of future vegetation-climate models.Location The data set represents vegetation from 175 sites around the world.Methods For more than 2500 vascular plant species, we compiled data on leaf mass per area (LMA), leaf life span (LL), nitrogen concentration (N mass ) and photosynthetic capacity (A mass ). Site climate was described with several standard indices. Correlation and regression analyses were used for quantifying relationships between single leaf traits and climate. Standardized major axis (SMA) analyses were used for assessing the effect of climate on bivariate relationships between leaf traits. Principal components analysis (PCA) was used to summarize multidimensional trait variation.Results At hotter, drier and higher irradiance sites, (1) mean LMA and leaf N per area were higher; (2) average LL was shorter at a given LMA, or the increase in LL was less for a given increase in LMA (LL-LMA relationships became less positive); and (3) A mass was lower at a given N mass , or the increase in A mass was less for a given increase in N mass . Considering all traits simultaneously, 18% of variation along the principal multivariate trait axis was explained by climate.Main conclusions Trait-shifts with climate were of sufficient magnitude to have major implications for plant dry mass and nutrient economics, and represent substantial selective pressures associated with adaptation to different climatic regimes.

show abstract

Distance‐based multivariate analyses confound location and dispersion effects

Warton¹,

Wright²,

Wang³

2011

Methods Ecol Evol

951

737

View full text Add to dashboard Cite

Summary 1.A critical property of count data is its mean-variance relationship, yet this is rarely considered in multivariate analysis in ecology. 2. This study considers what is being implicitly assumed about the mean-variance relationship in distance-based analyses -multivariate analyses based on a matrix of pairwise distances -and what the effect is of any misspecification of the mean-variance relationship. 3. It is shown that distance-based analyses make implicit assumptions that are typically out-of-step with what is observed in real data, which has major consequences. 4. Potential consequences of this mean-variance misspecification are: confounding location and dispersion effects in ordinations; misleading results when trying to identify taxa in which an effect is expressed; failure to detect a multivariate effect unless it is expressed in high-variance taxa. 5. Data transformation does not solve the problem. 6. A solution is to use generalised linear models and their recent multivariate generalisations, which is shown here to have desirable properties.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

David I. Warton

Bivariate line‐fitting methods for allometry

The arcsine is asinine: the analysis of proportions in ecology

Cross‐validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure

mvabund– an R package for model‐based analysis of multivariate abundance data

smatr 3– an R package for estimation and inference about allometric lines

So Many Variables: Joint Modeling in Community Ecology

Modulation of leaf economic traits and trait relationships by climate

Distance‐based multivariate analyses confound location and dispersion effects

Contact Info

Product

Resources

About