RStudio Certification

Learner persona

Kelly Katz

General background: Kelly is a medical student, she decided to go for a Master in Clinical Research before going back to the hospital for her clinical rotations. She is very young and enthusiastic. Although she studied some statistics during high school, this is the first time that she is learning how statistics are applied in health sciences and that motivates her.
Relevant experience: During her Master Program she learned how to do descriptive statistics and some basic analysis with SPSS, and she feels comfortable with it. She used R and Rstudio thanks to some specific biostats courses, but as she claims, it was “only copy-paste-enter pieces of code and interpret the output”. So far, she did a bit of data wrangling in excel, most of her classes had already clean data ready for analysis.
Perceived needs: Kelly has one year to present her thesis, and she will work with observational data of individuals followed over time. She will have to merge and clean all the data files that she requires for her analysis and she is a bit anxious about it. She heard about the reproducibility crisis in research during some seminars, and she feels that the best way to make her data cleaning process transparent is do it in R.
Special considerations: Kelly gets super enthusiastic about learning, but she is used to learn from books and a clearly specified curricula. The amount of information available online to learn R overwhelms her and doesn’t let her focus, and that frustrates her.
Needs: Kelly needs a clear structure and guide to learn R. A step-by-step tutorial on each topic from the The R4DS book will help her learn the basics for data wrangling and plotting.

Concept map

The class will introduce the rules of tidy data and the key functions to reshape data from wide to long and viceversa. The remaining concepts in the map will be taught in a following module of the extended class. The dplyr package has been studied and used so far, and the students are familiarized with the pipe %>%.

Formative assessments

1. Which of these tables meets the 3 rules of tidy data?

Exercise

Table A

country	1999	2000
Afghanistan	745	2666
Brazil	37737	80488
China	212258	213766

Table B

country	year	rate
Afghanistan	1999	745/19987071
Afghanistan	2000	2666/20595360
Brazil	1999	37737/172006362
Brazil	2000	80488/174504898
China	1999	212258/1272915272
China	2000	213766/1280428583

Table C

country	year	cases
Afghanistan	1999	745
Afghanistan	2000	2666
Brazil	1999	37737
Brazil	2000	80488
China	1999	212258
China	2000	213766

Solution

Correct answer is Table C

Missconceptions:

It is frequent to find datasets with repeated measurements over time as in table A, students might have seen and used this type of datasets before and believe that each year represents a new variable, a new property to be measured.
Although rate has values that reflect the state of the variable, it is a variable that is not tidy because it contains two numeric values that represent two different variables: cases and population. With those variables, it would be tidier to calculate the rate using mutate, and obtain a numeric value that can be summarised and plotted.

2. We need to transform table 1, to look as table 2.

Exercise

We need to transform Table 1, to look as Table 2. Fill in the blanks, correct if necessary:

survey %>% 
  pivot_____(names_from = "____",
              values_from = "____")

Table 1

student	food	rate
1	fruit	5
1	vegetable	1
1	icecream	7
2	fruit	5
2	vegetable	4
2	icecream	3
3	fruit	1
3	vegetable	6
3	icecream	9

Table 2

student	fruit	vegetable	icecream
1	5	1	7
2	5	4	3
3	1	6	9

Solution

survey %>%
  pivot_wider(names_from = food,
              values_from = rate) %>% 
  mytable()

student	fruit	vegetable	icecream
1	5	1	7
2	5	4	3
3	1	6	9

RStudio Certification

Teaching Exam

L. Paloma Rojas Saunero

2020-01-05

Learner persona

Concept map

Formative assessments

1. Which of these tables meets the 3 rules of tidy data?

Exercise

Solution

2. We need to transform table 1, to look as table 2.

Exercise

Solution

Materials