At the risk of sounding like a broken record, you’re going to need three things to write R.
- A computer.
- A text editor.
R is a free software environment for statistical computing and graphics. You can download a copy from the Comprehensive R Archive Network (CRAN). Versions are available for Windows, MacOS and Unixen. Like Perl, R can run scripts written with a text-editor and saved to file; however, it is common to use R through its interactive command-line interface, at the
Text that should be input into the R command line will look like this, in red:
t.test( c(1,2,3,4), c(4,6,7,6) )
In R itself, each input line will be preceded by the > prompt, but I have missed these off so that you will be able to copy-and-paste input text directly into R, and have it run without any modification.
Text that is output from R will look like this, in blue:
Welch Two Sample t-test data: c(1, 2, 3, 4) and c(4, 6, 7, 6) t = -3.6056, df = 5.996, p-value = 0.0113 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -5.455968 -1.044032 sample estimates: mean of x mean of y 2.50 5.75
For everyday programming (which in my job, involves a lot of text-munging), I use Perl, so this tutorial for R does not cover all the ins-and-outs of R objects, closures, modules, functional programming, etc., as I mostly use R in the limited domain of statistics and graphing. However, it does include some of the theory underlying the statistical tests. Unlike the Perl tutorial, which is more of a reference work, this tutorial is based around a large number of worked exercises, which will hopefully help you practice the material and stretch you beyond it.
- Running R code. Saving and running code to run in R. Loading data into R from CSVs. Vectors and functions for manipulating them.
- Kinds of data. Discrete, categorical, (un)bounded, continuous, ordinal, numeric, and friends.
- Formatting data. Saving data from spreadsheets for import into R.
- Plotting data. Boxplots and basic graphical parameters in R.
~“modelled on” syntax.
- Descriptive statistics. Measures of central tendency. Measures of data variability. Histograms and degrees of freedom.
- Statistical testing. Fisher p-value and significance, binomial distributions.
binom.test. Distribution functions:
- Statistical power. Neyman-Pearson hypothesis testing. Type I and II errors. Random variables from distributions:
rbinom. Installing packages.
- The F test. Comparison of the variance of two sets of continuous data.
- The t test. Comparison of the means of two sets of continuous data.
- Linear regression. Correlation of two sets of continuous data. Checking residuals for normality.
- The χ² test. Comparison of expected and observed count data.
- Analysis of variance (ANOVA). Basics, and one-way ANOVA.
- Two way ANOVA. Interaction plots and model simplification.
- Nonlinear regression. Non-linear least squares curve fitting.
The posts above don’t cover every conceivable test, but here’s a handy flowchart that will help you find out what test you actually need. It’s not perfect though: an ANOVA is a perfectly sensible technique for analysing a one-factor two-level design with a continuous response variable with normal errors, but the flow-chart will lead you to t. There’s rarely a single, correct answer to ‘how should I analyse this data set?’, but there are certainly many answers to that question!