At the risk of sounding like a broken record, you’re going to need three things to write R.
- A computer.
- The
R
interpreter. - A text editor.
R is a free software environment for statistical computing and graphics. You can download a copy from the Comprehensive R Archive Network (CRAN). Versions are available for Windows, MacOS and Unixen. Like Perl, R can run scripts written with a text-editor and saved to file; however, it is common to use R through its interactive command-line interface, at the >|
prompt.
Text that should be input into the R command line will look like this, in red:
t.test( c(1,2,3,4), c(4,6,7,6) )
In R itself, each input line will be preceded by the > prompt, but I have missed these off so that you will be able to copy-and-paste input text directly into R, and have it run without any modification.
Text that is output from R will look like this, in blue:
Welch Two Sample t-test data: c(1, 2, 3, 4) and c(4, 6, 7, 6) t = -3.6056, df = 5.996, p-value = 0.0113 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -5.455968 -1.044032 sample estimates: mean of x mean of y 2.50 5.75
For everyday programming (which in my job, involves a lot of text-munging), I use Perl, so this tutorial for R does not cover all the ins-and-outs of R objects, closures, modules, functional programming, etc., as I mostly use R in the limited domain of statistics and graphing. However, it does include some of the theory underlying the statistical tests. Unlike the Perl tutorial, which is more of a reference work, this tutorial is based around a large number of worked exercises, which will hopefully help you practice the material and stretch you beyond it.
- Running R code. Saving and running code to run in R. Loading data into R from CSVs. Vectors and functions for manipulating them.
c
,mean
,sum
,max
,min
,seq
andrep
. - Kinds of data. Discrete, categorical, (un)bounded, continuous, ordinal, numeric, and friends.
- Formatting data. Saving data from spreadsheets for import into R.
read
,data.frame
andwrite
. - Plotting data. Boxplots and basic graphical parameters in R.
plot
,boxplot
and the~
“modelled on” syntax. - Descriptive statistics. Measures of central tendency. Measures of data variability. Histograms and degrees of freedom.
hist
,var
,sd
,median
andsummary
. - Statistical testing. Fisher p-value and significance, binomial distributions.
binom.test
. Distribution functions:dbinom
,pbinom
andqbinom
- Statistical power. Neyman-Pearson hypothesis testing. Type I and II errors. Random variables from distributions:
rbinom
. Installing packages. - The F test. Comparison of the variance of two sets of continuous data.
var.test
, andrnorm
. - The t test. Comparison of the means of two sets of continuous data.
t.test
,tapply
andqqnorm
- Linear regression. Correlation of two sets of continuous data. Checking residuals for normality.
lm
,segments
,abline
,predict
,residuals
andfitted
- The χ² test. Comparison of expected and observed count data.
chisq.test
,head
,str
,table
,list
,matrix
anddimnames
. - Analysis of variance (ANOVA). Basics, and one-way ANOVA.
aov
,anova
andTukeyHSD
. - Two way ANOVA. Interaction plots and model simplification.
interaction.plot
andupdate
. - Nonlinear regression. Non-linear least squares curve fitting.
nls
,coef
,log
andlines
.
The posts above don’t cover every conceivable test, but here’s a handy flowchart that will help you find out what test you actually need. It’s not perfect though: an ANOVA is a perfectly sensible technique for analysing a one-factor two-level design with a continuous response variable with normal errors, but the flow-chart will lead you to t. There’s rarely a single, correct answer to ‘how should I analyse this data set?’, but there are certainly many answers to that question!
5 pings