R is a language and environment for statistical computing and graphics.
R is available as Free Software under the terms of the Free Software Foundation's
GNU General Public License in source code form.
It compiles and runs on a wide variety of UNIX platforms and similar systems
(including FreeBSD and Linux),Windows and MacOS.
R can be extended (easily) via packages. There are about eight packages supplied
with the R distribution and many more are available through the CRAN family of
Internet sites covering a very wide range of modern statistics.
A fully planned and coherent system that includes:
• an effective data handling and storage facility,
• a suite of operators for calculations on arrays (matrices),
• a large, coherent, integrated collection of intermediate tools for data analysis,
• graphical facilities for data analysis and display (on-screen or on hardcopy),
• a well-developed, simple and effective programming languages which
includes conditionals, loops, user-defined recursive functions and input and
output facilities.
http://www.r-project.org/
packages
Install
packages
Install packages
in RStudio
Essential commands in R
Example in R
Vectors
# Character vector:
> c("Huey","Dewey","Louie")
[1] "Huey" "Dewey" "Louie"
# Logical vector:
> c(T,T,F,T)
[1] TRUE TRUE FALSE TRUE
#Functions that create vectors:
c-“concatenate”
> c(42,57,12,39)
[1] 42 57 12 39
seq-”sequence”
> seq(4,9)
[1] 4 5 6 7 8 9
# Numeric vector:
> c(2,3,5,7,9)
[1] 2 3 5 7 9
rep-”replicate”
> rep(1:2,5)
[1] 1 2 1 2 1 2 1 2 1 2
> rep(1:2,c(3,4))
[1] 1 1 1 2 2 2 2
Factors
Factors – a data structure that makes it possible to assign meaningful names
to the categories.
> pain=c(0,3,2,2,1)
> fpain=factor(pain,levels=0:3)
> levels(fpain)=c("none","mild","medium","severe")
> fpain
[1] none severe medium medium mild
Levels: none mild medium severe
> levels(fpain)
[1] "none" "mild" "medium" "severe"
> x=1:2
> x=1:12
> dim(x)=c(3,4)
>x
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
> x=matrix(1:12,nrow=3,byrow=T)
> rownames(x)=LETTERS[1:3]
>x
[,1] [,2] [,3] [,4]
A 1 2 3 4
B 5 6 7 8
C 9 10 11 12
> t(x)
AB C
[1,] 1 5 9
[2,] 2 6 10
[3,] 3 7 11
[4,] 4 8 12
LETTERS- build in variable that contains the capital letters A-Z.
t(x) – the transpose matrix of x.
# Use the functions cbind and rbind to “bind” vectors together
columnwise or rowwise.
> cbind(A=1:4,B=5:8,C=9:12)
AB C
[1,] 1 5 9
[2,] 2 6 10
[3,] 3 7 11
[4,] 4 8 12
> rbind(A=1:4,B=5:8,C=9:12)
[,1] [,2] [,3] [,4]
A 1 2 3 4
B 5 6 7 8
C 9 10 11 12
Data frame – it is a list of vectors and/or factors of the same length, which are
related “across”, such that data in the same position come from the same
experimental unit (subject, animal, etc.).
> conc=c(5,12,20,24,35,40)
> vol=c(20,25,33,40,50,55)
> d=data.frame(conc,vol)
>d
conc vol
1 5 20
2 12 25
3 20 33
4 24 40
5 35 50
6 40 55
Data: “Soil”
Soil properties of two adjacent locations on Wimbledon common, a sandy
lowland heath (site1), and adjoining spoil mounds of calcareous clay (site 2).
Parameters:
Site - site number
pH
cond - electrical conductivity of soil solution
OM - percentage organic matter composition of soil
H2O – percentage water content of soil after drying to 105°F
A comment in R is marked with #
#import a .text file:
#import a .csv file:
> Soil
Site rep pH cond OM H2O
1 1 1 4.5 55 26 17
2 1 1 5.4 60 16 21
3 1 3 5.1 49 NA 18
4 1 4 4.8 55 27 18
5 2 1 7.6 155
5 25
6 2 2 7.8 124 NA 35
7 2 3 7.2 141
6 32
8 2 4 7.3 166
8 29
#Display the column names of “Soil” data:
> names(Soil)
[1] "Site" "rep" "pH" "cond" "OM" "H2O"
#Display the row names:
> rownames(Soil)
[1] "1" "2" "3" "4" "5" "6" "7" "8"
#Display the dimensions of the Soil data:
> dim(Soil)
[1] 8 6
rows
(observations)
columns
(variables)
#Select the second column of the data:
> Soil[,2]
[1] 1 1 3 4 1 2 3 4
#or:
> Soil\$rep
[1] 1 1 3 4 1 2 3 4
#Select the third row of the data:
>Soil[3,]
Site rep pH cond OM H2O
3 1 3 5.1 49 34 18
#Select rows 2,4, and 5:
> Soil[c(2,4,5),]
Site rep pH cond OM H2O
2 1 1 5.4 60 16 21
4 1 4 4.8 55 27 18
5 2 1 7.6 155 5 25
#Display the length of the second column:
> length(Soil[,2])
[1] 8
#Add a new column log.pH containing the logarithmic transform of pH:
>Soil2=transform(Soil,log.pH=log(Soil\$pH))
> Soil2
Site rep pH cond OM H2O log.pH
1 1 1 4.5 55 26 17 1.504077
2 1 1 5.4 60 16 21 1.686399
3 1 3 5.1 49 NA 18 1.629241
4 1 4 4.8 55 27 18 1.568616
5 2 1 7.6 155 5 25 2.028148
6 2 2 7.8 124 NA 35 2.054124
7 2 3 7.2 141 6 32 1.974081
8 2 4 7.3 166 8 29 1.987874
#Delete the third column (pH) of the “Soil2” data:
> Soil3=Soil2[,-3]
> Soil3
Site rep cond OM H2O
1 1 1
55 26 17
2 1 1
60 16 21
3 1 3
49 NA 18
4 1 4
55 27 18
5 2 1 155
5 25
6 2 2 124 NA 35
7 2 3 141
6 32
8 2 4 166
8 29
log.pH
1.504077
1.686399
1.629241
1.568616
2.028148
2.054124
1.974081
1.987874
#Select the first four columns of the “Soil” data:
> Soil4=Soil[,1:4]
> Soil4
Site rep pH cond
1 1 1 4.5 55
2 1 1 5.4 60
3 1 3 5.1 49
4 1 4 4.8 55
5 2 1 7.6 155
6 2 2 7.8 124
7 2 3 7.2 141
8 2 4 7.3 166
#Obtain a subset of the “Soil” data with cond >100:
> Soil5=subset(Soil,Soil\$cond>100)
> Soil5
Site rep pH cond OM H2O
5 2 1 7.6 155 5 25
6 2 2 7.8 124 NA 35
7 2 3 7.2 141
6 32
8 2 4 7.3 166
8 29
#Obtain a subset of the “Soil” data with cond >100 and H2O<32
>Soil6=subset(Soil,Soil\$cond>100&Soil\$H2O<32)
> Soil6
Site rep pH cond OM H2O
5 2 1 7.6 155 5 25
8 2 4 7.3 166 8 29
#Obtain a subset of the “Soil” data with no missing values (NA):
> Soil7=subset(Soil, !is.na(Soil\$OM))
> Soil7
Site rep pH cond OM H2O
1 1 1 4.5 55 26 17
2 1 1 5.4 60 16 21
4 1 4 4.8 55 27 18
5 2 1 7.6 155 5 25
7 2 3 7.2 141 6 32
8 2 4 7.3 166 8 29
#Obtain a subset of the “Soil” data with missing values (NA):
> Soil8=subset(Soil,is.na(Soil\$OM))
> Soil8
Site rep pH cond OM H2O
3 1 3 5.1 49 NA 18
6 2 2 7.8 124 NA 35
#Identify which observations have pH<7:
> which(Soil\$pH<7)
[1] 1 2 3 4
# observations (rows) 1,2,3,and 4 have pH<7.
#Identify which observations have missing values for OM:
> which(is.na(Soil\$OM))
[1] 3 6
#observations 3 and 6 have missing values for OM.
#Identify which observation has pH=5.4:
> which(Soil\$pH==5.4)
[1] 2
#Identify which observations are not from the Site 1:
> which(Soil\$Site!=1)
[1] 5 6 7 8
#Order “Soil” data by pH:
Increasing
> Soil9=Soil[order(Soil\$pH),]
> Soil9
Site rep pH cond OM H2O
1 1 1 4.5 55 26 17
4 1 4 4.8 55 27 18
3 1 3 5.1 49 NA 18
2 1 1 5.4 60 16 21
7 2 3 7.2 141 6 32
8 2 4 7.3 166 8 29
5 2 1 7.6 155 5 25
6 2 2 7.8 124 NA 35
Decreasing
> Soil10=Soil[order(-Soil\$pH),]
> Soil10
Site rep pH cond OM H2O
6 2 2 7.8 124 NA 35
5 2 1 7.6 155 5 25
8 2 4 7.3 166 8 29
7 2 3 7.2 141 6 32
2 1 1 5.4 60 16 21
3 1 3 5.1 49 NA 18
4 1 4 4.8 55 27 18
1 1 1 4.5 55 26 17
#Save “Soil10” data from the R console to your computer:
>write.table(Soil10,file="E:/Multivariate_analysis/pH_Order_Soil.csv“,
row.names=F,col.names=names(Soil10),quote=F,sep=",")
#Load a package in R (after installing it):
> library(MASS)
# load the package called MASS
# Get help with R functions:
or
Get help in R
#Calculate mean, standard deviation, variance, median, sum, and maximum
and minimum values for “cond” in “Soil” data:
> mean(Soil\$cond)
[1] 100.625
> sum(Soil\$cond)
[1] 805
> sd(Soil\$cond)
[1] 50.54824
> max(Soil\$cond)
[1] 166
> var(Soil\$cond)
[1] 2555.125
> min(Soil\$cond)
[1] 49
> median(Soil\$cond)
[1] 92
Graphics in R
Example of multivariate data
Graphics in R
Example of multivariate data
```