Handling Missing & Duplicate Data in R
Dealing with missing values:
- You can use the is.na() function to detect missing values, and use the subset() function to select data that does not contain missing values.
- Use the na.omit() function to remove rows containing missing values.
- Use the complete.cases() function to remove rows containing missing values.
- Fill in missing values using the mean() function or the median() function.
- Fill in missing values by using the functions na.approx() or na.spline() for interpolation.
Dealing with duplicate data:
- Use the duplicated() function to detect duplicate data.
- Use the unique() function to eliminate duplicate data.
- Use the subset() function to select data that does not contain duplicate entries.
- Use the distinct() function to eliminate duplicate data.
- Use the aggregate() function or the distinct() function in the dplyr package to handle duplicate data.