R

At the risk of sounding like a broken record, you’re going to need three things to write R.

  • A computer.
  • The R interpreter.
  • A text editor.

R is a free software environment for statistical computing and graphics. You can download a copy from the Comprehensive R Archive Network (CRAN). Versions are available for Windows, MacOS and Unixen. Like Perl, R can run scripts written with a text-editor and saved to file; however, it is common to use R through its interactive command-line interface, at the >| prompt.

Text that should be input into the R command line will look like this, in red:

t.test( c(1,2,3,4), c(4,6,7,6) )

In R itself, each input line will be preceded by the > prompt, but I have missed these off so that you will be able to copy-and-paste input text directly into R, and have it run without any modification.

Text that is output from R will look like this, in blue:

Welch Two Sample t-test

data: c(1, 2, 3, 4) and c(4, 6, 7, 6)
t = -3.6056, df = 5.996, p-value = 0.0113
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-5.455968 -1.044032
sample estimates:
mean of x mean of y
2.50 5.75

For everyday programming (which in my job, involves a lot of text-munging), I use Perl, so this tutorial for R does not cover all the ins-and-outs of R objects, closures, modules, functional programming, etc., as I mostly use R in the limited domain of statistics and graphing. However, it does include some of the theory underlying the statistical tests. Unlike the Perl tutorial, which is more of a reference work, this tutorial is based around a large number of worked exercises, which will hopefully help you practice the material and stretch you beyond it.

  1. Running R code. Saving and running code to run in R. Loading data into R from CSVs. Vectors and functions for manipulating them. c, mean, sum, max, min, seq and rep.
  2. Kinds of data. Discrete, categorical, (un)bounded, continuous, ordinal, numeric, and friends.
  3. Formatting data. Saving data from spreadsheets for import into R. read, data.frame and write.
  4. Plotting data. Boxplots and basic graphical parameters in R. plot, boxplot and the ~ “modelled on” syntax.
  5. Descriptive statistics. Measures of central tendency. Measures of data variability. Histograms and degrees of freedom. hist, var, sd, median and summary.
  6. Statistical testing. Fisher p-value and significance, binomial distributions. binom.test. Distribution functions: dbinom, pbinom and qbinom
  7. Statistical power. Neyman-Pearson hypothesis testing. Type I and II errors. Random variables from distributions: rbinom. Installing packages.
  8. The F test. Comparison of the variance of two sets of continuous data. var.test, and rnorm.
  9. The t test. Comparison of the means of two sets of continuous data. t.test, tapply and qqnorm
  10. Linear regression. Correlation of two sets of continuous data. Checking residuals for normality. lm, segments, abline, predict, residuals and fitted
  11. The χ² test. Comparison of expected and observed count data. chisq.test, head, str, table, list, matrix and dimnames.
  12. Analysis of variance (ANOVA). Basics, and one-way ANOVA. aov, anova and TukeyHSD.
  13. Two way ANOVA. Interaction plots and model simplification. interaction.plot and update.
  14. Nonlinear regression. Non-linear least squares curve fitting. nls, coef, log and lines.

The posts above don’t cover every conceivable test, but here’s a handy flowchart that will help you find out what test you actually need. It’s not perfect though: an ANOVA is a perfectly sensible technique for analysing a one-factor two-level design with a continuous response variable with normal errors, but the flow-chart will lead you to t. There’s rarely a single, correct answer to ‘how should I analyse this data set?’, but there are certainly many answers to that question!

Statistics flowchart [Public domain]

Statistics flowchart

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>