R: Because the names of other stat programs don*t make sense so

Because the names of other stat
programs don’t make sense so
why should this one?
 The three Ws of R: What, Where, and Why
 Commonly used operators
 Formatting your data for R
 Working with data in R
 Exporting data from R
What is R?
 “R is a language and environment for statistical
computing and graphics.” (http://www.r-project.org)
 It’s a programming language first and a statistical
analysis tool second
 Entirely syntax based – Similar to the SAS and SPSS
 User can download “packages” which are similar to
SPSS Modules
Where is R?
 Available for download at:
 http://www.r-project.org/
 Works on PCs, Macs, and Linux OS
 Doesn’t require a ton of computer memory (I find
that it runs smoother than both SAS and SPSS)
Why use R?
 It’s an open-source project aka FREE!
 It’s gaining traction in both industry (Google,
Facebook, & Kickstarter) and academia
 It’s combination of programming flexibility and
statistical analyses capabilities makes it one of the
more powerful data analysis programs out there
Commonly used operators
 <-
:Assignment operator
:Comment operator
 >, <, ==, | :Boolean operators
 +, -, *, ^
:Mathematical operators
Formatting your data for R: A Brief Intro
 R can read SAS, SPSS, STATA, txt files, and csv files
 I recommend that you store your data in a csv file
 R can easily read csv files
 Csv files can be imported to and exported from SAS and SPSS
 Other statistical programs can easily read csv files
 I write all of my code in notepad (more habit than
anything else), but R has many different GUIs
Formatting your data for R: Three easy steps
 1) Turn your data file into a csv file
 2) Use the read.csv() function
 Dataset <- read.csv(‘Dataset location.csv’)
 3) Dataset is now a user-defined object (in this
particular case it’s a dataframe in R) that contains
all of your data
Formatting your data for R: Common Mistakes (That
I’ve made 100 times over)
 R cannot read \ (the backslash), thus when you
write the location of your dataset you have to use
either / or \\
 R is case sensitive, so ‘C:\\Dataset.csv’ and
‘C:\\dataset.csv’ are not the same in R speak
 Always make sure you include the file extension
(.csv, .txt, .whatever)!
Working with data in R: Things to check
 I always check the dimensions of my dataset
 dim(Dataset) – this will return two numbers: row x column.
Rows = number of cases and columns = number of variables
 Check the names of your dataset
 names(Dataset)
 Check the descriptive statistics for anything out of
the ordinary:
 Notice the brackets?
Working with data in R: Subsetting your
 First, begin thinking about your dataset as a matrix
 Rows = cases and columns = variables
 Dataset[5,1] means return the observation stored in row 5
column 1
 Dataset[,1] means return all of the rows in column 1
 Dataset[2,1:5] means return all of the observations in row
2 and columns 1 through 5
Working with data in R: Subsetting your
 Alternatively, you can reference a column directly
by using the $ operator:
Dataset$Var1 will return the entire Var1 column from
 What if I want to filter by some variable?
 ds.Female <- Dataset[Dataset$Var11 == ‘Female’,]
 The above creates a dataframe called ds.Female that
filtered out any case where Var11 equaled ‘Male’
Working with data in R: Reverse Coding
 What do you do if you have some variables that
need to be reverse coded?
(1 + highest scale value – Variable) is the general formula
Dataset$Var12 <- 8 – Dataset$Var10 – This does two
things. 1) Creates another column in Dataset labeled
Var12 and 2) Sets Var12 equal to 8 – Var10
Check with cor(Dataset$Var10, Dataset$Var12,
Working with data in R: Internal R Functions
 mean(Dataset$Var1,na.rm=T) =
 sd(Dataset$Var4, na.rm=T)
 min(Dataset$Var5,na.rm=T) and max(Dataset$Var5,
 cor(Dataset, use=‘complete.obs’)
Working with data in R: Internal R Functions
 modlm <- lm(Var2 ~ Var3, data=Dataset)
 Ordinary Least Squares Regression, regressing variable 2 onto
variable 3
 modanova <- lm(Var4 ~ as.factor(Var11), data=Dataset)
 OLS Regression, regressing variable 4 onto the categorical
gender variable – This is an ANOVA!
 modanova1 <- aov(Var4 ~ as.factor(Var11), data=Dataset)
aov is R’s built in ANOVA function
 dif <- TukeyHSD(modanova1)
 Tukey’s Honestly Significant Difference Test
Exporting Data from R
 write.csv(Dataset, ‘Location.csv’)
 BOOM goes the dynamite
Thank you!

similar documents