MX-course_R_Intro

Report
Overall Aims
• Introduce programming concepts relevant to MX
• Demonstrate the strengths (and weaknesses) of R
Introduction to R:
Joseph Powell
Books
•
The R Book – Crawley (2007)
•
Introductions to statistics using R
•
•
•
–
Cohen Y. and Cohen J. Y. (2008). Statistics and Data with R.
–
Crawley M. (2005). Statistics: An Introduction using R.
–
Dalgaard P. (2002). Introductory Statistics with R.
–
Maindonald J. & Braun J. (2003). Data Analysis and Graphics Using R: An Example-based Approach.
Books on biological topics
–
Paradis E. (2006). Analysis of Phylogenetics and Evolution with R.
–
Broman K. W. & Sen S. (2009). A Guide to QTL Mapping with R/qtl.
–
Bolker B.M. (2008). Ecological Models and Data in R.
Books on statistical topics
–
Aitkin M. et al. (2009). Statistical Modelling in R.
–
Faraway J. (2009). Linear Models with R.
–
Albert J. (2009). Bayesian Computation with R.
–
Bivand R.S. et al. (2009). Applied Spatial Data Analysis with R.
–
Cowpertwait P.S.P. & Metcalfe A.V. (2009). Introductory Time Series with R.
Books on R specifics and R programming
–
Spector P. (2008). Data Manipulation with R.
–
Murrell P. (2006). R Graphics.
–
Chambers J. M. (2008). Software for Data Analysis: Programming with R.
Introduction to R:
Joseph Powell
Websites
• Websites:
–
–
–
–
–
–
Cran R: http://www.r-project.org/
R cookbook: http://www.r-cookbook.com/
R graphics: http://addictedtor.free.fr/graphiques/
R wiki: http://wiki.r-project.org/
Mailing lists: http://www.r-project.org/mail.html
R seek: http://www.rseek.org/
• Websites on statistical topics
– R genetics: http://rgenetics.org/trac/rgalaxy
– Bioconductor: http://www.bioconductor.org/
Introduction to R:
Joseph Powell
The console
• Load up R
• Console window appears, with a command prompt
• Everything in the R console can be partitioned into two
fundamental operations:
– Input variables
> x <- 2
– Output variables
> x
[1] 2
Introduction to R: Joseph Powell
Objects
• Names
– Case sensitive, no spaces
– Must begin with a letter but also can contain numbers and: . _
– Try to give your objects meaningful names
> My_f4vourite.langua6e_evR <- “R”
• x, y and My_f4v… are objects that we have created
> ls()
# this will bring up a list of all our objects
> rm(y)
# this deletes y (forever)
> rm(list=ls()) # this deletes everything (..forever)
Introduction to R: Joseph Powell
Workspace 1
• Everything shown in this list of objects comprises our
'workspace'
> ls()
[1] "My_f4vourite.langua6e_evR" "x" "y“
> save.image(file=“myworkspace.RData”)
> rm(list=ls())
> ls()
character(0)
> load(file = “myworkspace.RData”)
> ls()
[1] "My_f4vourite.langua6e_evR" "x" "y“
• Objects are internal to R
– Does not behave like a file structure on the computer
– Can't be read or interpreted outside R (?)
Introduction to R:
Joseph Powell
Workspace 2
•
You can select which objects to save
> save(y, x, file = “two_objects.RData”)
•
Different computer folders can be accessed
> dir()
# shows current work directory
> setwd(“~/work_directory”)
# sets R's focus to a different computer folder
Introduction to R:
Joseph Powell
Built-in functions
•
Native functions make R succinct
•
Diverse range available from graphics to data manipulation
to statistical algorithms etc.
•
Highly optimised so use them if they are available instead of
writing your own
•
Function structure:
> function_name(<argument 1>, <argument 2>, …)
Introduction to R:
Joseph Powell
Missing values
•
NA is a “reserved” word in R
•
It is a single element (length 1) that indicates a missing value
•
A helpful alternative to coding missing values (e.g -99)
> my_array <- c(NA,100,120,120,120,130,NA)
> sum(my_array)
[1] NA
> sum(my_array,na.rm=T)
# most functions allow you to explicitly state how to
handle NA
[1] 590
> table(my_array)
my_array
100 120 130
1
3
Introduction to R:
1
Joseph Powell
# HOWEVER the default action varies from function to function
R help pages
•
Each function has its own unique syntax
–
–
–
Default arguments
Data structure requirements
Output options
> ?seq
> ??”sequence”
•
# brings up help page of seq() function
# searches for all related functions
Note
> seq(from = 2, to = 100, by = 2)
is clearer than
> seq(2,100,2)
Introduction to R:
Joseph Powell
Basic Scripting
• Note pad / text editor
– Within the R GUI
– Open with: File > New Script
or Ctrl+N
– Layout as tile is useful: Windows > Tile
Introduction to R:
Joseph Powell
Basic Scripting
• Note pad / text editor
–
–
–
–
Useful for keeping all work together
Scripts can be saved
Can be used to save a “program”
Add # comments
– Check individual bits of code
– Ctrl+R
• Whole line
• Selected code
Introduction to R:
Joseph Powell
Basic Scripting
• Brackets
– ( )
– [ ]
– { }
functions
subsets
processes
• Subsets
– Take a subset of an object
– Objects have either 1 x n, or m x n dimensions
> x
> x[5]
[1] 2 5 6 2 6 77 55
[1] 6
> x
[,1] [,2] [,3] [,4]
[1,]
1
4
7
10
[2,]
2
5
8
11
[3,]
3
6
9
12
Introduction to R:
Joseph Powell
> X[3,4]
[1] 12
[rows, columns]
Basic Scripting
• Data input
– Direct input into the console
• scan()
– Reading in data
• read.table / read.csv
– “name.txt”
– “c:\\temp\\name.txt”
– choose.file()
– list.files()
– dir()
> y <- scan()
1: 3
2: 4
3: 12
4: 3
5: 5
6: 2
7: 14
8:
Read 7 items
> dir()
[1] "temp.csv" "temp2.csv" “name.txt”
> y <- read.table("name.txt", header=T, sep="\t")
>
Introduction to R:
Joseph Powell
Basic Scripting
• Data output
– Direct input into the console
• sink()
sink(“sink_tmp.txt”)
i <- 1:10
outer(i, i, "*")
– Writing out data
sink()
• write.table / write.csv
– “name.txt”
– “c:\\temp\\name.txt”
> dir()
[1] "temp.csv" "temp2.csv" “name.txt”
> write.table("name.txt", header=T, sep="\t")
>
Introduction to R:
Joseph Powell
Basic Scripting
• Adding rows and columns
– Allows objects to be joined, either to an existing object or to make a new
object
– cbind() – adds columns together
– rbind() – adds rows together
> y1
[1,]
[2,]
[3,]
[4,]
[,1] [,2] [,3]
1
3 12.5
1
2 13.8
1
5 15.3
1
4 16.8
> y2
[1,]
[2,]
[3,]
[4,]
[,1]
0.349
0.745
0.684
0.964
Introduction to R:
Joseph Powell
> y3 <- cbind(y1, y2)
> y3
[,1] [,2] [,3] [,4]
[1,]
1
3 12.5 0.349
[2,]
1
2 13.8 0.745
[3,]
1
5 15.3 0.684
[4,]
1
4 16.8 0.964
> y3 <- rbind(y1, y2[1:3])
> y3
[,1] [,2]
[,3]
[1,] 1.000 3.000 12.500
[2,] 1.000 2.000 13.800
[3,] 1.000 5.000 15.300
[4,] 1.000 4.000 16.800
[5,] 0.349 0.745 0.684
Basic Scripting
• for loops
– loop through a set of commands a given number of times
– very useful, but are not optimal for memory
> dim(y)
[1] 10 10
> out <- array(0, c(ncol(y), 1))
> for(i in 1:ncol(y)) {
y_mean <- mean(y[i, 1:10])
}
> for(i in 1:ncol(y)) {
out[i] <- mean(y[i, ])
}
> out
> y_mean
[1] 0.1974492
Introduction to R:
[1,]
[2,]
[3,]
[4,]
[5,]
[6,]
[7,]
[8,]
[9,]
[10,]
Joseph Powell
[,1]
-0.3110800
-0.2000344
0.2019573
0.2859823
0.1932523
0.2759323
-0.2571102
-0.1037983
0.3522018
0.1974492
Data Manipulation
• Check data
–
–
–
–
–
–
–
dim()
mydata[1:10, 1:10]
str()
summary()
head()
tail()
table()
– etc…
Introduction to R:
Joseph Powell
> mydata <- read.table("mydata.txt", header=T, sep="\t")
> dim(mydata)
[1] 642 1470
> mydata[1:10, 1:10]
[1,]
[2,]
[3,]
[4,]
[5,]
[6,]
[7,]
[8,]
[9,]
[10,]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
2
2
1
2
1
2
0
1
0
1
0
0
2
2
0
0
1
2
1
2
0
2
2
2
1
1
0
0
2
1
2
0
2
2
2
0
1
2
0
1
2
0
0
2
0
1
1
0
2
0
2
1
2
1
1
0
2
2
1
1
1
1
2
2
1
2
2
2
0
1
0
1
0
0
0
1
1
1
1
1
0
0
1
2
1
2
2
0
0
1
1
0
1
1
2
0
1
0
0
1
Data Manipulation
• Reordering
– If you have a data.frame or matrix (numbers or letters)
– Use: order()
– index <- order(old[,1], decreasing=T)
> dim(lamb)
[1] 1600 5
> head(lamb)
Field
Weight sire dam sex
1
A 22.92368
1
1
F
2
A 27.52896
1
1
F
3
A 25.52592
1
1
M
4
A 25.56016
1
1
M
5
A 24.53296
1
2
F
6
A 22.03344
1
2
F
Introduction to R:
Joseph Powell
> lamb <- lamb[order(lamb$sex, decreasing=F), ]
> head(lamb)
Field
Weight sire dam sex
1
A 22.92368
1
1
F
2
A 27.52896
1
1
F
5
A 24.53296
1
2
F
6
A 22.03344
1
2
F
9
A 30.37944
2
1
F
10
A 25.93680
2
1
F
Data Manipulation
• Reordering
– order()
> lamb <- lamb[order(lamb$sex, decreasing=F), ]
> rows <- order(lamb$sex, decreasing=F)
> lamb <- lamb[rows, ]
Expanded way
> index <- order(lamb$sex, decreasing=F)
> head(index)
[1] 1
2
5
6
9 10
> lamb <- lamb[index, ]
Introduction to R:
Joseph Powell
Data Manipulation
• Replacing
– which()
– index
> class(lamb)
[1] “matrix”
> head(lamb)
Field
Weight sire dam sex
1
A 22.92368
1
1
F
2
A 27.52896
1
1
F
3
B 25.52592
1
1
M
> index <- which(lamb[,1]=="A")
> head(index)
1 2 4 6 7 10
> lamb[index, 1] <- ”C”
Put it together
> index <- lamb[,1]==“A”
> head(index)
[1] TRUE TRUE FALSE TRUE FALSE
> lamb[index, 1] <- ”C”
> head(lamb)
Field
Weight sire dam sex
1
C 22.92368
1
1
F
2
C 27.52896
1
1
F
3
B 25.52592
1
1
M
Introduction to R:
Joseph Powell
> lamb[which(lamb[,1]==”A”, 1] <- ”C”
Data Manipulation
• Replacing
> class(lamb)
[1] “matrix”
> head(lamb)
Field
Weight sire dam sex
1
A 22.92368
1
1
F
2
A 27.52896
1
1
F
3
B 25.52592
1
1
M
> index <- lamb[,2] <= 22.000
> table(index)
index
FALSE TRUE
1553
47
> lamb[index, 2] <- ”NA”
> which(lamb[,2] >= 20.0 & lamb[,2] <= 21.0)
214 363 496 842 921 983 1103 1126
> which(lamb[,1]==“A” & lamb[,2] >= 20.0 &
lamb[,2] <= 21.0)
214 363 496
> new_lamb <- lamb[which(lamb[,1]==“A” & lamb[,2]
>= 20.0 & lamb[,2] <= 21.0) , ]
> new_lamb
214
363
496
Introduction to R:
Joseph Powell
Field
A
A
A
Weight
2046
2008
2041
sire
27
46
62
dam
2
1
2
sex
F
M
M
Graphics with R: Overview
1. Why graphics?
2. Why graphics in R?
3. The R graphics systems (did you really expect just one?)
4. Graphics basics and examples
5. Customisation of a graphic
6. Overview of different systems and packages
Introduction to R:
Joseph Powell
plot(x, y, …)
> ?Formaldehyde
> head(Formaldehyde)
carb optden
1 0.1 0.086
2 0.3 0.269
3 0.5 0.446
4 0.6 0.538
5 0.7 0.626
6 0.9 0.782
> plot(Formaldehyde)
> ?par
Introduction to R:
Joseph Powell

similar documents