2.7 Other Useful Things
Alright, we now have a basic understanding of the most fundamental operations in R programming, most of which are things we will frequently use in this course. Next, we’ll introduce a few more useful concepts and some commonly used functions.
2.7.1 Workspace
In R, the workspace refers to the environment where all objects (such as variables, functions, and data) are stored during an R session. It acts as a storage area that retains the data and objects you create, allowing you to work with them without needing to re-import or redefine them every time you start R. The most common scenario is when you’ve worked hard all day and want to take a break, but if you close R, all the objects in your working environment (memory) will disappear. In this case, you can save your current working environment as a workspace file, which has a .RData extension.
There are two ways to save your working environment as a workspace file. First, by mouse actions, you can click Session
-> Save Workspace As...
. Or you can do it by command
save.image("FileName.RData")
The next day, after enjoying the morning sunshine (if conditions permit) and your coffee, you can load this file and continue your hard work!
2.7.2 Packages
If R could only be used for scientific computing, it would undoubtedly be overshadowed by numerous other scientific computing programs. The true strength of R lies in its extensibility, which is achieved through R packages. Initially, R packages were primarily written by statisticians to implement new methods, such as lme4
for fitting generalized linear mixed-effects models; survival
for conducting survival analysis; psych
for psychological research, and so on. However, writing packages is not exclusive to statisticians; an increasing number of non-statistical application packages have also been developed. Today, R has become incredibly versatile through the extension of various packages, for example this website is written by quarto
package. Below, we will briefly illustrate how to install and load packages using examples.
install.packages("kernelab")
# to install a new package. Note: the quotation marks are essential.
library(kernelab)
# you can import a package by function `library`
2.7.3 Useful Functions
Next, some useful functions are introduced. These functions were extremely useful back when I was a student. However, in the era of RStudio, their usefulness has been greatly reduced. Nonetheless, they are still quite necessary for those who prefer keyboard operations or need to work on a server. In addition, these functions can, to some extent, enhance R users’ understanding of R programming.
ls
function: it can list all the objects in the workspace or current environment.rm
function: it can help us to remove objects from the workspace or current environment.
# Example 1
= 1
x rm(x)
# Example 2
rm(list = ls()) # Danger Warning: This command will remove all objects listed by `ls`
str
function: it displays the structure of an object.
# Example 1
= list()
x 1]] = 1:10
x[[2]] = letters[4:10]
x[[str(x)
# Example 2
= t.test(rnorm(30)) # do one sample t-test and save results in `res`
res str(res)
# You can see that the testing results are saved in a list of 10.
# if you want to extract elements from it, the information coveryed by ´str´ is ideal.
summary
function: it helps us to summarize useful information from an R objects. The information extracted depends on the type of the object. For examples
# Example 1
= iris[,-5] # we use the first 4 variable from iris data
dat summary(dat) # the type of ´dat´ is dataframe, then the summarized informations are...
# Example 2
= t.test(rnorm(30))
res summary(res)
# the type of ´res´ is results of t test. The designer of this function decided
# to show the names of all the elements in ´res´, similiar to the output of ´str´
unique
andtable
functions: they are useful when you want to check all possible values in a variable and the frequency of different possible values.
# First, we create a small demo dataset
= c(1,1,0,0,1,1,0,0)
treatment = c(1,1,1,1,2,2,2,2)
block = c("F","M","M","M","F","F","M","F")
sex = c(19,20,28,22,21,19,23,20)
age = c(20,19,33,12,54,87,98,84)
outcome = data.frame(treatment, block, sex, age, outcome)
Dat head(Data, 8)
# Example 1:
unique(Dat$sex)
table(Dat$sex)
unique(Dat$age)
table(Dat$age)
# Example 2:
table(Data$sex, Data$treatment) # do you know the name of the outputs?
which
function: it finds the index of elements that satisfy some conditions in a vector, or matrix, or data frame.
# Example: Use the same demo data above
which(Dat$sex == "M")
which(Dat$age < 21)
apply
function: it is used to perform operations on rows or columns of matrices, data frames, or higher-dimensional arrays. It allows you to apply a function across the rows or columns without needing to use loops, making code more concise and often more efficient.
# Syntax:
apply(X, margin, fun)
# `margin` is an integer specifying whether to apply the `fun` across rows (1) or columns (2)
# Examples: Use the demo data but ignore the variable `sex`
= Dat[, -3]
Dat apply(Dat, 2, mean)
Next, show some useful functions for graphics. The ggplot2
package is definitely the top choice for plotting, but sometimes the following functions are more practical and convenient for data visualization. I will only list them below, and you are already strong enough to investigate them by yourself :)
hist
function: it can help use check the distribution of a variable.plot
function: it is usually used to show the scatter plot of two variables.pairs
function: it shows the pairwise scatter plot of many variables.