2.7 Other Useful Things

Alright, we now have a basic understanding of the most fundamental operations in R programming, most of which are things we will frequently use in this course. Next, we’ll introduce a few more useful concepts and some commonly used functions.

2.7.1 Workspace

In R, the workspace refers to the environment where all objects (such as variables, functions, and data) are stored during an R session. It acts as a storage area that retains the data and objects you create, allowing you to work with them without needing to re-import or redefine them every time you start R. The most common scenario is when you’ve worked hard all day and want to take a break, but if you close R, all the objects in your working environment (memory) will disappear. In this case, you can save your current working environment as a workspace file, which has a .RData extension.

There are two ways to save your working environment as a workspace file. First, by mouse actions, you can click Session -> Save Workspace As.... Or you can do it by command

save.image("FileName.RData")

The next day, after enjoying the morning sunshine (if conditions permit) and your coffee, you can load this file and continue your hard work!

2.7.2 Packages

If R could only be used for scientific computing, it would undoubtedly be overshadowed by numerous other scientific computing programs. The true strength of R lies in its extensibility, which is achieved through R packages. Initially, R packages were primarily written by statisticians to implement new methods, such as lme4 for fitting generalized linear mixed-effects models; survival for conducting survival analysis; psych for psychological research, and so on. However, writing packages is not exclusive to statisticians; an increasing number of non-statistical application packages have also been developed. Today, R has become incredibly versatile through the extension of various packages, for example this website is written by quarto package. Below, we will briefly illustrate how to install and load packages using examples.

install.packages("kernelab") 
# to install a new package. Note: the quotation marks are essential.

library(kernelab) 
# you can import a package by function `library`

2.7.3 Useful Functions

Next, some useful functions are introduced. These functions were extremely useful back when I was a student. However, in the era of RStudio, their usefulness has been greatly reduced. Nonetheless, they are still quite necessary for those who prefer keyboard operations or need to work on a server. In addition, these functions can, to some extent, enhance R users’ understanding of R programming.

  • ls function: it can list all the objects in the workspace or current environment.

  • rm function: it can help us to remove objects from the workspace or current environment.

# Example 1
x = 1
rm(x) 
# Example 2
rm(list = ls()) # Danger Warning: This command will remove all objects listed by `ls`
  • str function: it displays the structure of an object.
# Example 1
x = list()
x[[1]] = 1:10
x[[2]] = letters[4:10]
str(x)
# Example 2
res = t.test(rnorm(30)) # do one sample t-test and save results in `res`
str(res) 
# You can see that the testing results are saved in a list of 10.
# if you want to extract elements from it, the information coveryed by ´str´ is ideal.
  • summary function: it helps us to summarize useful information from an R objects. The information extracted depends on the type of the object. For examples
# Example 1
dat = iris[,-5] # we use the first 4 variable from iris data
summary(dat) # the type of ´dat´ is dataframe, then the summarized informations are...
# Example 2
res = t.test(rnorm(30))
summary(res) 
# the type of ´res´ is results of t test. The designer of this function decided 
# to show the names of all the elements in ´res´, similiar to the output of ´str´
  • unique and table functions: they are useful when you want to check all possible values in a variable and the frequency of different possible values.
# First, we create a small demo dataset
treatment = c(1,1,0,0,1,1,0,0)
block = c(1,1,1,1,2,2,2,2)
sex = c("F","M","M","M","F","F","M","F")
age = c(19,20,28,22,21,19,23,20)
outcome = c(20,19,33,12,54,87,98,84)
Dat = data.frame(treatment, block, sex, age, outcome)
head(Data, 8)

# Example 1:
unique(Dat$sex)
table(Dat$sex)
unique(Dat$age)
table(Dat$age)

# Example 2:
table(Data$sex, Data$treatment) # do you know the name of the outputs?
  • which function: it finds the index of elements that satisfy some conditions in a vector, or matrix, or data frame.
# Example: Use the same demo data above
which(Dat$sex == "M")
which(Dat$age < 21)
  • apply function: it is used to perform operations on rows or columns of matrices, data frames, or higher-dimensional arrays. It allows you to apply a function across the rows or columns without needing to use loops, making code more concise and often more efficient.
# Syntax: 
apply(X, margin, fun)
# `margin` is an integer specifying whether to apply the `fun` across rows (1) or columns (2) 
# Examples: Use the demo data but ignore the variable `sex`
Dat = Dat[, -3]
apply(Dat, 2, mean)

Next, show some useful functions for graphics. The ggplot2 package is definitely the top choice for plotting, but sometimes the following functions are more practical and convenient for data visualization. I will only list them below, and you are already strong enough to investigate them by yourself :)

  • hist function: it can help use check the distribution of a variable.

  • plot function: it is usually used to show the scatter plot of two variables.

  • pairs function: it shows the pairwise scatter plot of many variables.

Previous page | Lecture 2 Homepage | Next page

© 2024 Xijia Liu. All rights reserved.
Logo