February 2014 archive

Analysis of variance: ANOVA (2 way)

The technique for a one-way ANOVA can be extended to situations where there is more than one factor, or – indeed – where there are several factors with several levels each, which may have synergistic or antagonistic effects on each other. In the models we have seen so far (linear regression, one-way ANOVA) all we …

Continue reading

Analysis of variance: ANOVA (1 way)

Analysis of variance is the technique to use when you might otherwise be considering a large number of pairwise F and t tests, i.e. where you want to know whether a factor with more than 2 levels is a useful predictor of a dependent variable. For example, cuckoo_eggs.csv contains data on the length of cuckoo eggs laid …

Continue reading

Comparison of expected and observed count data: the χ² test

A χ2 test is used to measure the discrepancy between the observed and expected values of count data. The dependent data must – by definition – be count data. If there are independent variables, they must be categorical. The test statistic derived from the two data sets is called χ2, and it is defined as …

Continue reading

Correlation of data: linear regression

Linear regression is used to see whether one continuous variable is correlated with another continuous variable in a linear way, i.e. can the dependent variable y be modelled with a straight-line response to changes in the independent covariate x: Here b is the estimated slope of the best-fit line (a.k.a. gradient, often written m), a …

Continue reading

Statistical testing

If you want a random yes/no answer to a question, like “who should kick-off this football match?” it’s very common to entrust the decision to the flip of a coin, on the assumption that the coin doesn’t care which side gets the advantage. But what if that trust is misplaced? What if the coin gives …

Continue reading

Comparison of means: the t test

A t test is used to compare the means of two data sets, and it relies on calculation of a test statistic called t. This statistic is derived from the two data sets and it is defined as the difference between the means of the two data sets, x̅1 and x̅2 (or the difference between a mean x̅ and …

Continue reading

Comparison of variances: the F test

An F test is used to compare the variances of two data sets: As it is used to compare variances, the dependent data must – by definition – be numeric. As it is used to compare two distinct sets of data, these sets represent the two levels of a factor. The test statistic we use …

Continue reading