R

At the risk of sounding like a broken record, you’re going to need three things to write R.

  • A computer.
  • The R interpreter.
  • A text editor.

R is a free software environment for statistical computing and graphics. You can download a copy from the Comprehensive R Archive Network (CRAN). Versions are available for Windows, MacOS and Unixen. Like Perl, R can run scripts written with a text-editor and saved to file; however, it is common to use R through its interactive command-line interface, at the >| prompt.

Text that should be input into the R command line will look like this, in red:

t.test( c(1,2,3,4), c(4,6,7,6) )

In R itself, each input line will be preceded by the > prompt, but I have missed these off so that you will be able to copy-and-paste input text directly into R, and have it run without any modification.

Text that is output from R will look like this, in blue:

Welch Two Sample t-test

data: c(1, 2, 3, 4) and c(4, 6, 7, 6)
t = -3.6056, df = 5.996, p-value = 0.0113
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-5.455968 -1.044032
sample estimates:
mean of x mean of y
2.50 5.75

For everyday programming (which in my job, involves a lot of text-munging), I use Perl, so this tutorial for R does not cover all the ins-and-outs of R objects, closures, modules, functional programming, etc., as I mostly use R in the limited domain of statistics and graphing. However, it does include some of the theory underlying the statistical tests. Unlike the Perl tutorial, which is more of a reference work, this tutorial is based around a large number of worked exercises, which will hopefully help you practice the material and stretch you beyond it.

  1. Running R code. Saving and running code to run in R. Loading data into R from CSVs. Vectors and functions for manipulating them. c, mean, sum, max, min, seq and rep.
  2. Kinds of data. Discrete, categorical, (un)bounded, continuous, ordinal, numeric, and friends.
  3. Formatting data. Saving data from spreadsheets for import into R. read, data.frame and write.
  4. Plotting data. Boxplots and basic graphical parameters in R. plot, boxplot and the ~ “modelled on” syntax.
  5. Descriptive statistics. Measures of central tendency. Measures of data variability. Histograms and degrees of freedom. hist, var, sd, median and summary.
  6. Statistical testing. Fisher p-value and significance, binomial distributions. binom.test. Distribution functions: dbinom, pbinom and qbinom
  7. Statistical power. Neyman-Pearson hypothesis testing. Type I and II errors. Random variables from distributions: rbinom. Installing packages.
  8. The F test. Comparison of the variance of two sets of continuous data. var.test, and rnorm.
  9. The t test. Comparison of the means of two sets of continuous data. t.test, tapply and qqnorm
  10. Linear regression. Correlation of two sets of continuous data. Checking residuals for normality. lm, segments, abline, predict, residuals and fitted
  11. The χ² test. Comparison of expected and observed count data. chisq.test, head, str, table, list, matrix and dimnames.
  12. Analysis of variance (ANOVA). Basics, and one-way ANOVA. aov, anova and TukeyHSD.
  13. Two way ANOVA. Interaction plots and model simplification. interaction.plot and update.
  14. Nonlinear regression. Non-linear least squares curve fitting. nls, coef, log and lines.

The posts above don’t cover every conceivable test, but here’s a handy flowchart that will help you find out what test you actually need. It’s not perfect though: an ANOVA is a perfectly sensible technique for analysing a one-factor two-level design with a continuous response variable with normal errors, but the flow-chart will lead you to t. There’s rarely a single, correct answer to ‘how should I analyse this data set?’, but there are certainly many answers to that question!

Statistics flowchart [Public domain]

Statistics flowchart

5 comments

Skip to comment form

    • student on 2017-07-18 at 15:49
    • Reply

    Please could you do a tutorial for mixed effect models?

  1. That’s extremely tempting, but it would probably need me to write at least one, probably two, on generalised linear models first. I’ll keep it in mind, but can’t promise anything!

      • student on 2017-07-21 at 16:26
      • Reply

      ooh okay, thank you 🙂

  2. Can I have permission to use your Stentor diagram on my web site http://www.canadiannaturephotographer.com? I am writing an article which will contain my own photomicrographs of Stentors and other ciliates. The sites purpose is education and inspiration. You are welcome to use some of my photographs if you like on your site.

    The article is to be posted in about 1-2 weeks it is still under development.
    Cheers
    Robert Berdan
    Calgary, Alberta

    1. Yes, certainly. All my images and text are released under a Creative Commons Attribution ShareAlike license, so if you’re just planning on using it in a post on your site, you should feel free to reuse them as long as you say where they’re from. I look forward to seeing your micrographs!

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.