Data structure

Typically we start with raw (untidy) data. So we need to clean and organize the data. To do this we need to understand:

each variable is a single column
each observation is a single row
each value is a single cell

In this supporting material, we will use the mpg dataset, available in the tidyverse package collection, which deals with the fuel economy of popular cars from 1999 to 2008, containing 234 observations and 11 variables.

manufacturer	model	displ	year	cyl	trans	drv	cty	hwy	fl	class
MPG dataset tidyverse
audi	a4	1.8	1999	4	auto(l5)	f	18	29	p	compact
audi	a4	1.8	1999	4	manual(m5)	f	21	29	p	compact
audi	a4	2.0	2008	4	manual(m6)	f	20	31	p	compact
audi	a4	2.0	2008	4	auto(av)	f	21	30	p	compact
audi	a4	2.8	1999	6	auto(l5)	f	16	26	p	compact
audi	a4	2.8	1999	6	manual(m5)	f	18	26	p	compact
audi	a4	3.1	2008	6	auto(av)	f	18	27	p	compact
audi	a4 quattro	1.8	1999	4	manual(m5)	4	18	26	p	compact
audi	a4 quattro	1.8	1999	4	auto(l5)	4	16	25	p	compact
audi	a4 quattro	2.0	2008	4	manual(m6)	4	20	28	p	compact