At the risk of sounding like a broken record, you’re going to need three things to write R.

- A computer.
- The
`R`

interpreter. - A text editor.

R is a free software environment for statistical computing and graphics. You can download a copy from the Comprehensive R Archive Network (CRAN). Versions are available for Windows, MacOS and Unixen. Like Perl, R can run scripts written with a text-editor and saved to file; however, it is common to use R through its interactive command-line interface, at the `>| `

prompt.

Text that should be input into the R command line will look like this, in red:

t.test( c(1,2,3,4), c(4,6,7,6) )

In R itself, each input line will be preceded by the > prompt, but I have missed these off so that you will be able to copy-and-paste input text directly into R, and have it run without any modification.

Text that is output from R will look like this, in blue:

Welch Two Sample t-test data: c(1, 2, 3, 4) and c(4, 6, 7, 6) t = -3.6056, df = 5.996, p-value = 0.0113 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -5.455968 -1.044032 sample estimates: mean of x mean of y 2.50 5.75

For everyday programming (which in my job, involves a lot of text-munging), I use Perl, so this tutorial for R does *not* cover all the ins-and-outs of R objects, closures, modules, functional programming, *etc.*, as I mostly use R in the limited domain of statistics and graphing. However, it does include some of the theory underlying the statistical tests. Unlike the Perl tutorial, which is more of a reference work, this tutorial is based around a large number of worked exercises, which will hopefully help you practice the material and stretch you beyond it.

- Running R code. Saving and running code to run in R. Loading data into R from CSVs. Vectors and functions for manipulating them.
`c`

,`mean`

,`sum`

,`max`

,`min`

,`seq`

and`rep`

. - Kinds of data. Discrete, categorical, (un)bounded, continuous, ordinal, numeric, and friends.
- Formatting data. Saving data from spreadsheets for import into R.
`read`

,`data.frame`

and`write`

. - Plotting data. Boxplots and basic graphical parameters in R.
`plot`

,`boxplot`

and the`~`

“modelled on” syntax. - Descriptive statistics. Measures of central tendency. Measures of data variability. Histograms and degrees of freedom.
`hist`

,`var`

,`sd`

,`median`

and`summary`

. - Statistical testing. Fisher
*p*-value and significance, binomial distributions.`binom.test`

. Distribution functions:`dbinom`

,`pbinom`

and`qbinom`

- Statistical power. Neyman-Pearson hypothesis testing. Type I and II errors. Random variables from distributions:
`rbinom`

. Installing packages. - The
*F*test. Comparison of the variance of two sets of continuous data.`var.test`

, and`rnorm`

. - The
*t*test. Comparison of the means of two sets of continuous data.`t.test`

,`tapply`

and`qqnorm`

- Linear regression. Correlation of two sets of continuous data. Checking residuals for normality.
`lm`

,`segments`

,`abline`

,`predict`

,`residuals`

and`fitted`

- The
*χ*² test. Comparison of expected and observed count data.`chisq.test`

,`head`

,`str`

,`table`

,`list`

,`matrix`

and`dimnames`

. - Analysis of variance (ANOVA). Basics, and one-way ANOVA.
`aov`

,`anova`

and`TukeyHSD`

. - Two way ANOVA. Interaction plots and model simplification.
`interaction.plot`

and`update`

. - Nonlinear regression. Non-linear least squares curve fitting.
`nls`

,`coef`

,`log`

and`lines`

.

The posts above don’t cover every conceivable test, but here’s a handy flowchart that will help you find out what test you actually need. It’s not perfect though: an ANOVA is a perfectly sensible technique for analysing a one-factor two-level design with a continuous response variable with normal errors, but the flow-chart will lead you to *t*. There’s rarely a single, correct answer to ‘how should I analyse this data set?’, but there are certainly many * answers to that question!*