Introduction to R and RStudio |
|
Project Management With RStudio |
|
Seeking Help |
|
Data Structures |
|
Subsetting Data |
|
Control Flow |
|
Creating Publication-Quality Graphics |
|
Vectorization |
|
Functions Explained |
|
Writing Data |
|
Split-Apply-Combine |
|
Dataframe Manipulation with dplyr |
|
Dataframe Manipulation with tidyr |
|
Producing Reports With knitr |
|
Writing Good Software |
|
(
, )
^
or **
/
*
+
-
2e-3
#
is a comment, R will ignore this!function_name()
. Expressions inside the
brackets are evaluated before being passed to the function, and
functions can be nested.exp
, sin
, log
, log10
, log2
etc.<
, <=
, >
, >=
, ==
, !=
all.equal
to compare numbers!<-
is the assignment operator. Anything to the right is evaluate, then
stored in a variable named to the left.ls
lists all variables and functions you’ve createdrm
can be used to remove them=
.packrat
package to create self-contained projectsinstall.packages
to install packages from CRANlibrary
to load a package into Rpackrat::status
to check whether all packages referenced in your
scripts have been installed.?function_name
or help(function_name)
?"+"
?dput
will dump data you are working from so others can load it easily.sessionInfo()
will give details of your setup that others may need for debugging.Individual values in R must be one of 5 data types, multiple values can be grouped in data structures.
Data types
typeof(object)
gives information about an items data type.?numeric
real (decimal) numbers?integer
whole numbers only?character
text?complex
complex numbers?logical
TRUE or FALSE valuesSpecial types:
?NA
missing values?NaN
“not a number” for undefined values (e.g. 0/0
).?Inf
, -Inf
infinity.?NULL
a data structure that doesn’t existNA
can occur in any atomic vector. NaN
, and Inf
can only
occur in complex, integer or numeric type vectors. Atomic vectors
are the building blocks for all other data structures. A NULL
value
will occur in place of an entire data structure (but can occur as list
elements).
Basic data structures in R:
- atomic ?vector
(can only contain one type)
- ?list
(containers for other objects)
- ?data.frame
two dimensional objects whose columns can contain different types of data
- ?matrix
two dimensional objects that can contain only one type of data.
- ?factor
vectors that contain predefined categorical data.
- ?array
multi-dimensional objects that can only contain one type of data
Remember that matrices are really atomic vectors underneath the hood, and that data.frames are really lists underneath the hood (this explains some of the weirder behaviour of R).
Vectors
- ?vector()
All items in a vector must be the same type.
- Items can be converted from one type to another using coercion.
- The concatenate function ‘c()’ will append items to a vector.
- seq(from=0, to=1, by=1)
will create a sequence of numbers.
- Items in a vector can be named using the names()
function.
Factors
- ?factor()
Factors are a data structure designed to store categorical data.
- levels()
shows the valid values that can be stored in a vector of type factor.
Lists
- ?list()
Lists are a data structure designed to store data of different types.
Matrices
- ?matrix()
Matrices are a data structure designed to store 2-dimensional data.
Data Frames
- ?data.frame
is a key data structure. It is a list
of vectors
.
- cbind()
will add a column (vector) to a data.frame.
- rbind()
will add a row (list) to a data.frame.
Useful functions for querying data structures:
- ?str
structure, prints out a summary of the whole data structure
- ?typeof
tells you the type inside an atomic vector
- ?class
what is the data structure?
- ?head
print the first n
elements (rows for two-dimensional objects)
- ?tail
print the last n
elements (rows for two-dimensional objects)
- ?rownames
, ?colnames
, ?dimnames
retrieve or modify the row names
and column names of an object.
- ?names
retrieve or modify the names of an atomic vector or list (or
columns of a data.frame).
- ?length
get the number of elements in an atomic vector
- ?nrow
, ?ncol
, ?dim
get the dimensions of a n-dimensional object
(Won’t work on atomic vectors or lists).
read.csv
to read in data in a regular structure
sep
argument to specify the separator
header=TRUE
if there is a header row[
single square brackets:
x[1]
extracts the first item from vector x.list()
.[
with two arguments to:
x[1,2]
will extract the value in row 1, column 2.x[2,:]
will extract the entire second column of values.[[
double square brackets to extract items from lists.$
to access columns or list elements by nameif
condition to start a conditional statement, else if
condition to provide
additional tests, and else
to provide a default==
to test for equality.X && Y
is only true if both X and Y are TRUE
.X || Y
is true if either X or Y, or both, are TRUE
.FALSE
; all other numbers are considered TRUE
library(ggplot2)
ggplot
to create the base figureaes
thetics specify the data axes, shape, color, and data sizegeom
etry functions specify the type of plot, e.g. point
, line
, density
, box
geom
etry functions also add statistical transforms, e.g. geom_smooth
scale
functions change the mapping from data to aestheticsfacet
functions stratify the figure into panelsaes
thetics apply to individual layers, or can be set for the whole plot
inside ggplot
.theme
functions change the overall look of the plotggsave
to save a figure.*
applies element-wise to matrices%*%
for true matrix multiplicationany()
will return TRUE
if any element of a vector is TRUE
all()
will return TRUE
if all elements of a vector are TRUE
?"function"
return
explicitlywrite.table
to write out objects in regular formatquote=FALSE
so that text isn’t wrapped in "
marksxxply
family of functions to apply functions to groups within
some data.a
rray , d
ata.frame or l
ist corresponds to the input dataplyr
family
of functions on groups within data.library(dplyr)
?select
to extract variables by name.?filter
return rows with matching conditions.?group_by
group data by one of more variables.?summarize
summarize multiple values to a single value.?mutate
add new variables to a data.frame.?"%>%"
pipe operator.library(tidyr)
#
character and run to the end of the line;
comments in SQL start with --
,
and other languages have other conventions.(5,3)
.