library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.5     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   2.0.0     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
Dealing with Missing Data
When calculating statistics (e.g. with summarize()), many calculations will give errors if your data contains NAs.
Example: Calculating Mean
data_missing <- tribble(
  ~x, ~y,
  2, 3,
  1, 4,
  NA, 2,
  3, NA,
  7, 8
)
Now if we were to get the mean of x:
data_missing %>% 
  summarize(mean_x = mean(x))
## # A tibble: 1 × 1
##   mean_x
##    <dbl>
## 1     NA
It gives us NA.
One way to combat this is to ignore all observations that contain NA values. Most statistics functions (like mean()) have an optional argument na.rm, which if set to TRUE, will ignore NAs when performing the calculation:
data_missing %>%
  summarize(mean_x = mean(x, na.rm = TRUE))
## # A tibble: 1 × 1
##   mean_x
##    <dbl>
## 1   3.25