At the risk of sounding like a broken record, you’re going to need three things to write R.

- A computer.
- The
`R`

interpreter. - A text editor.

R is a free software environment for statistical computing and graphics. You can download a copy from the Comprehensive R Archive Network (CRAN). Versions are available for Windows, MacOS and Unixen. Like Perl, R can run scripts written with a text-editor and saved to file; however, it is common to use R through its interactive command-line interface, at the `>| `

prompt.

Text that should be input into the R command line will look like this, in red:

t.test( c(1,2,3,4), c(4,6,7,6) )

In R itself, each input line will be preceded by the > prompt, but I have missed these off so that you will be able to copy-and-paste input text directly into R, and have it run without any modification.

Text that is output from R will look like this, in blue:

Welch Two Sample t-test data: c(1, 2, 3, 4) and c(4, 6, 7, 6) t = -3.6056, df = 5.996, p-value = 0.0113 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -5.455968 -1.044032 sample estimates: mean of x mean of y 2.50 5.75

For everyday programming (which in my job, involves a lot of text-munging), I use Perl, so this tutorial for R does *not* cover all the ins-and-outs of R objects, closures, modules, functional programming, *etc.*, as I mostly use R in the limited domain of statistics and graphing. However, it does include some of the theory underlying the statistical tests. Unlike the Perl tutorial, which is more of a reference work, this tutorial is based around a large number of worked exercises, which will hopefully help you practice the material and stretch you beyond it.

- Running R code. Saving and running code to run in R. Loading data into R from CSVs. Vectors and functions for manipulating them.
`c`

,`mean`

,`sum`

,`max`

,`min`

,`seq`

and`rep`

. - Kinds of data. Discrete, categorical, (un)bounded, continuous, ordinal, numeric, and friends.
- Formatting data. Saving data from spreadsheets for import into R.
`read`

,`data.frame`

and`write`

. - Plotting data. Boxplots and basic graphical parameters in R.
`plot`

,`boxplot`

and the`~`

“modelled on” syntax. - Descriptive statistics. Measures of central tendency. Measures of data variability. Histograms and degrees of freedom.
`hist`

,`var`

,`sd`

,`median`

and`summary`

. - Statistical testing. Fisher
*p*-value and significance, binomial distributions.`binom.test`

. Distribution functions:`dbinom`

,`pbinom`

and`qbinom`

- Statistical power. Neyman-Pearson hypothesis testing. Type I and II errors. Random variables from distributions:
`rbinom`

. Installing packages. - The
*F*test. Comparison of the variance of two sets of continuous data.`var.test`

, and`rnorm`

. - The
*t*test. Comparison of the means of two sets of continuous data.`t.test`

,`tapply`

and`qqnorm`

- Linear regression. Correlation of two sets of continuous data. Checking residuals for normality.
`lm`

,`segments`

,`abline`

,`predict`

,`residuals`

and`fitted`

- The
*χ*² test. Comparison of expected and observed count data.`chisq.test`

,`head`

,`str`

,`table`

,`list`

,`matrix`

and`dimnames`

. - Analysis of variance (ANOVA). Basics, and one-way ANOVA.
`aov`

,`anova`

and`TukeyHSD`

. - Two way ANOVA. Interaction plots and model simplification.
`interaction.plot`

and`update`

. - Nonlinear regression. Non-linear least squares curve fitting.
`nls`

,`coef`

,`log`

and`lines`

.

The posts above don’t cover every conceivable test, but here’s a handy flowchart that will help you find out what test you actually need. It’s not perfect though: an ANOVA is a perfectly sensible technique for analysing a one-factor two-level design with a continuous response variable with normal errors, but the flow-chart will lead you to *t*. There’s rarely a single, correct answer to ‘how should I analyse this data set?’, but there are certainly many * answers to that question!*

## 3 comments

Please could you do a tutorial for mixed effect models?

Author

That’s extremely tempting, but it would probably need me to write at least one, probably two, on generalised linear models first. I’ll keep it in mind, but can’t promise anything!

ooh okay, thank you 🙂