Handling Missing & Duplicate Data in R

Dealing with missing values:

  1. You can use the is.na() function to detect missing values, and use the subset() function to select data that does not contain missing values.
  2. Use the na.omit() function to remove rows containing missing values.
  3. Use the complete.cases() function to remove rows containing missing values.
  4. Fill in missing values using the mean() function or the median() function.
  5. Fill in missing values by using the functions na.approx() or na.spline() for interpolation.

Dealing with duplicate data:

  1. Use the duplicated() function to detect duplicate data.
  2. Use the unique() function to eliminate duplicate data.
  3. Use the subset() function to select data that does not contain duplicate entries.
  4. Use the distinct() function to eliminate duplicate data.
  5. Use the aggregate() function or the distinct() function in the dplyr package to handle duplicate data.
bannerAds