2.4.3 Data Frame
As a programming language originally designed for statisticians, importing data and setting a specific structure for it is essential. It is so called data frame. In R, we can use various functions to read in different types of data, such as txt, csv, xlsx, and more. For example, you can apply read.table
function to import data saved in a txt file. You can download Boston data here.
# Prepare a txt file, 'Boston.txt', in a director
setwd("dir containing your data")
# we set the dir containg your data as the working director.
= read.table("Boston.txt", sep=",", header = T) dat
Remark: Setting working director (WD) is always useful since it can simplify many things, for example, if we don’t set the WD as the folder containing ‘Boston.txt’, then you have to specify the dir in the first argument of the read.table
function. Setting a WD can be done by function setwd
, and for checking the current WD, you can use function getwd
. In Rstudio, this action also can be done using mouse actions, see figure below.
Data frame is a fundamental data structure used for storing tabular data, where each column can hold different types of data (e.g., numeric, character, or factor). Data frame can be created by function data.frame
. For example:
X = cbind(x1,x2))
(dat = data.frame(x1,x2))
(class(X)
class(dat) # it seems there is no difference between a matrix and a dataframe
%*%t(X) # try this
X%*%t(dat) # try this -> matrix multiplication is not allowed. dat
So, usually, the operations and functions for a matrix are not allowed to apply to a data frame. Including different types of data is the main difference between data frame and matrix. For example:
# with the same demo data above
= letters[1:3] # define another variable
x3 = cbind(x1, x2, x3)
X = data.frame(x1, x2, x3)
dat
X# compare `X` and `dat`, draw yoru conclusions. dat
For a data frame, we still can use the same method as for matrix to slice. Another more practical way is using $
to slice. For examples:
# with the same example above
3]
dat[,$x3 dat
Some useful functions for data frames
head
andtail
functions: they can help us to check the first and last few lines respectively. For examples:
= iris # iris is a pre-stored data set in R which includes 150 iris flowers
dat head(dat)
tail(dat)
head(dat, 10)
names
function: it can help us quickly check the names of all variables.attach
anddetach
functions: people feel very inconvenient to use$
to slice a data frame, but want to use the variable names directly. In this case, ´attach´ function can help us go into such kind of mode, and apparentlydetach
function can cease this mode. For examples:
= iris
dat names(iris) # [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
# you can't find it
Species $Species # works
dat
attach(dat)
# also works
Species
detach(dat)
Species